tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.37k stars 1.96k forks source link

I get lots of <unk> as translation result for my English-Turkish model #296

Closed vdtx closed 6 years ago

vdtx commented 6 years ago

I have nearly 200 thousand aligned sentences an I extract vocab from that data. However when I want to translate a sentence the output_infer file includes unk> unk> unk> unk> unk>unk> unk> unk> unk> sometimes more.

Could it be because of my vocabulary. what should be the vocab file structure. A piece of the training process.

src: Makedonya Cumhurba\u015fkan\u0131 Gjorge \u0130vanov Alman mevkida\u015f\u0131 Horst K\xf6hler ile 14 Eyl\xfcl Pazartesi g\xfcn\xfc Berlin'de ger\xe7ekle\u015fen g\xf6r\xfc\u015fme sonras\u0131nda yapt\u0131\u011f\u0131 a\xe7\u0131klamada, Almanya'n\u0131n Makedonya'n\u0131n AB \xfcyelik hedefini destekledi\u011fini s\xf6yledi. ref: Germany supports Macedonia's bid to join the EU, Macedonian President Gjorge Ivanov said after meeting his German counterpart Horst Koehler in Berlin on Monday (September14th). nmt: "unk" President Gjorge Ivanov said that Germany supports "unk" EU integration "unk" after meeting with the German "unk unk" on Monday "unk> <unk" after a meeting in "unk"

src: Finlandiya \xf6rne\u011fi, in\u015faatta ya\u015fanan \xfc\xe7 ila be\u015f y\u0131ll\u0131k gecikmelerin [enerji] fiyatlar\u0131n\u0131 2 milyardan 10 milyar avroya \xe7\u0131kard\u0131\u011f\u0131n\u0131 g\xf6stermektedir." diye a\xe7\u0131kl\u0131yor. ref: The example of Finland shows that three to five years of construction delays increased [energy] prices from 2 billion to 10 billion euros." nmt: lots of unk

It seems working but as I said the translation result is just lots of unk>. by the way I am very new in this stuff.

vdtx commented 6 years ago

I guess the problem was that the --num_train_steps was not enough. I just raised the number and now it works fine.