Can't reproduce the results on WMT En-De

ictnlp / OR-NMT

Source Code for ACL2019 paper <Bridging the Gap between Training and Inference for Neural Machine Translation>

41 stars 10 forks source link

Can't reproduce the results on WMT En-De #11

Open Paulmzr opened 3 years ago

Paulmzr commented 3 years ago

Hi, all. I am trying to reproduce the word-level oracle results of this paper on WMT EN-DE dataset. I train the transformer model 150k steps with #Gpus = 4, #Freq = 2, #Toks = 4096 and save the checkpoints every 5k steps. The last 5 checkpoints are averaged to obtain the final model.

However, I find it difficult to reproduce the results reported with any decay_k. (The parameter of gumbel noise is fixed to 0.8 as recommended in this repo.)

Any suggestion to tune the parameter decay_k to reproduce? Thanks a lot!

Answer3664 commented 3 years ago

Hi, all. I am trying to reproduce the word-level oracle results of this paper on WMT EN-DE dataset. I train the transformer model 150k steps with #Gpus = 4, #Freq = 2, #Toks = 4096 and save the checkpoints every 5k steps. The last 5 checkpoints are averaged to obtain the final model.

However, I find it difficult to reproduce the results reported with any decay_k. (The parameter of gumbel noise is fixed to 0.8 as recommended in this repo.)

Any suggestion to tune the parameter decay_k to reproduce? Thanks a lot!

Hi，can you provide your reproduced results on WMT14 EN-DE, the same thing happened to me.

songmzhang commented 2 years ago

Hi, all. I am trying to reproduce the word-level oracle results of this paper on WMT EN-DE dataset. I train the transformer model 150k steps with #Gpus = 4, #Freq = 2, #Toks = 4096 and save the checkpoints every 5k steps. The last 5 checkpoints are averaged to obtain the final model.

However, I find it difficult to reproduce the results reported with any decay_k. (The parameter of gumbel noise is fixed to 0.8 as recommended in this repo.)

Any suggestion to tune the parameter decay_k to reproduce? Thanks a lot!

Hi, similar settings with yours and decay schedule with the author's example, also fail to reproduce the improvement of word-level oracle, have you got any idea about this problem?