Reproducing result: WMT German-English bleu score is less than the half of the expected score

molipet commented 6 years ago

Thanks for sharing this great work!

Although, I strictly tried to follow the instructions in the ReadMe, I am unable the reproduce the WMT German-English benchmark results on newstest2015.

Here are my details:

python 3.6.2, Tensorflow 1.5.1
I used the provided nmt/scripts/wmt16_en_de.sh to download and pre-process the data files.
I patched the nmt/standard_hparams/wmt16.json by adding two lines "num_encoder_layers": 4, "num_decoder_layers": 4, in order to avoid the problem described in #264, and #265.
I used the following pre-trained models:
- http://download.tensorflow.org/models/nmt/deen_model_1.zip
- http://download.tensorflow.org/models/nmt/deen_model_2.zip

I got the following inference results for newstest_2015:

deen_model_1 -- real bleu: 11.7, expected bleu: 27.6 (command to run inference: python -m nmt.nmt --src=de --tgt=en --ckpt=deen_model_1/translate.ckpt --hparams_path=nmt/standard_hparams/wmt16.json --out_dir=deen_model_1_output --vocab_prefix=wmt16/vocab.bpe.32000 --inference_input_file=wmt16/newstest2015.tok.bpe.32000.de --inference_output_file=deen_model_1_output/output_infer --inference_ref_file=wmt16/newstest2015.tok.bpe.32000.en)
deen_model_2 -- real bleu: 11.8, expected bleu: 28.9 (command to run inference: python -m nmt.nmt --src=de --tgt=en --ckpt=deen_model_2/translate.ckpt --hparams_path=nmt/standard_hparams/wmt16.json --out_dir=deen_model_2_output --vocab_prefix=wmt16/vocab.bpe.32000 --inference_input_file=wmt16/newstest2015.tok.bpe.32000.de --inference_output_file=deen_model_2_output/output_infer --inference_ref_file=wmt16/newstest2015.tok.bpe.32000.en)

Could you please provide any hint or help what am I doing wrong?

Thank you!

potato1996 commented 6 years ago

The same here. My setup: python3.5 + tf1.8

ajithAI commented 6 years ago

I am also having similar issue. I am translating English to German ( newstest2015.tok.bpe.32000.en ) using "Ours — NMT + GNMT attention (8 layers)" model. The cited Bleu score : 27.6. But I got 21.0 command used : python -m nmt.nmt --src=en --tgt=de --ckpt=../ende_gnmt_model_8_layer/translate.ckpt --hparams_path=nmt/standard_hparams/wmt16_gnmt_8_layer.json --out_dir=../ende_model_gnmt_8_output_news15 --vocab_prefix=tmp/wmt16/vocab.bpe.32000 --inference_input_file=tmp/wmt16/newstest2015.tok.bpe.32000.en --inference_output_file=../ende_model_gnmt_8_output_news15/output_infer --inference_ref_file=tmp/wmt16/newstest2015.tok.bpe.32000.de

TF : 1.8 , Python 2.7 .

Anyone please let me know if I am doing anything not proper !!

Thanks in Advance :)

Hyper-Param File wmt16_gnmt_8_layer.json contains :

{ "attention": "normed_bahdanau", "attention_architecture": "gnmt_v2", "batch_size": 128, "colocate_gradients_with_ops": true, "dropout": 0.2, "encoder_type": "gnmt", "eos": "", "forget_bias": 1.0, "infer_batch_size": 32, "init_weight": 0.1, "learning_rate": 1.0, "max_gradient_norm": 5.0, "metrics": ["bleu"], "num_buckets": 5, "num_layers": 8, "num_encoder_layers": 8, "num_decoder_layers": 8, "num_train_steps": 340000, "decay_scheme": "luong10", "num_units": 1024, "optimizer": "sgd", "residual": true, "share_vocab": false, "subword_option": "bpe", "sos": "<s", # IgnoreThis ( some GIT text edit error ) "src_max_len": 50, "src_max_len_infer": null, "steps_per_external_eval": null, "steps_per_stats": 100, "tgt_max_len": 50, "tgt_max_len_infer": null, "time_major": true, "unit_type": "lstm", "beam_width": 10, "length_penalty_weight": 1.0 }

qwerybot commented 5 years ago

I know it's been a while since you posted but it seems that when I run the download to get the wmt16 data I'm getting a different output from the BPE processing, resulting in a different vocabulary.

I was hoping someone might be able to provide me with their working vocab, inference files etc for English-German

tensorflow / nmt

Reproducing result: WMT German-English bleu score is less than the half of the expected score #341