marian-nmt / marian

Fast Neural Machine Translation in C++
https://marian-nmt.github.io
Other
1.23k stars 228 forks source link

Transformers | decoding long sentence fails #293

Closed adjouama closed 4 years ago

adjouama commented 4 years ago

I produced a model English-Chinese. However,during decoding, it does not translate long sentences. Below is my valid.log and my configuration. Thank you a lot in advance.

snukky commented 4 years ago

Could you provide more details what does it mean that the model does not translate long sentences? There is a segfault or no output produced? Is it for a single sentence or do you translate a file?

The marian-decoder has the --max-length option, which by default is 1000. I guess, your sentences are not that long after subword segmentation?

adjouama commented 4 years ago

Thank you for the quick answer and sorry for the lack of information. Basically, the output stays the same as the input. Sometimes, it translates few words (see example below of the output);

The World Telecommunication/ICT indicators database on USB Key and online contains time series data for the years 1960, 1965, 1970 and year from 1975 to 2018 for more than 180 telecommunication/ICT statistics covering fixed-phone networks, mobile-cellular telephone subscriber, quality of service, Internet (including fixed- and mobile broadband subscriber data), traffic, staff, price, investment, investment and investment on ICT access and use by home and persons. Sred population,宏 economic and broadcasting statistics are also included. Data for over 200 economic are available.

I send my long paragraph to marian-server as a one sentence. Is that related to --max-length parameter in the training process ?

Note that the short sentences gets translated perfectly with a very good quality.

Thank you in advance,

adjouama commented 4 years ago

Here is another example:

adjouama commented 4 years ago

Issue solved after playing around vocabulary size. Thanks