tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.36k stars 1.96k forks source link

Why there are most of unk symbols while evaluation during training? #374

Open Ak-sky opened 6 years ago

Ak-sky commented 6 years ago

0 down vote favorite I am doing the incremental training my Hi-En NMT system but even after 300K steps most of the translations are coming with unk symbols. How to overcome this situation so that I get the proper translations.

I want to make sure, the steps that I will follow to start the incremental training is correct - -Generating preprocessing data (not the vocab) using the wmt shell script from nmt repo. -Vocabulary to be used from the previous preprocessed data. -copying the checkpoint, translate.ckpt-340000.data-00000-of-00001, translate.ckpt-340000.index, translate.ckpt-340000.meta to the new out_dir -Using the dev/test set from the previous preprocessed data -Modifying the "num_train_steps" as 350000 in json file.(wmt16_gnmt_4_layer.json)

Please do let me know if the above mentioned steps can be used for incremental training for the new corpus.

frajos100 commented 5 years ago

Facing the similar issue. Anyone who could help.