Open Paulmzr opened 3 years ago
Hi, all. I am trying to reproduce the word-level oracle results of this paper on WMT EN-DE dataset. I train the transformer model 150k steps with #Gpus = 4, #Freq = 2, #Toks = 4096 and save the checkpoints every 5k steps. The last 5 checkpoints are averaged to obtain the final model.
However, I find it difficult to reproduce the results reported with any
decay_k
. (The parameter of gumbel noise is fixed to 0.8 as recommended in this repo.)Any suggestion to tune the parameter
decay_k
to reproduce? Thanks a lot!
Hi,can you provide your reproduced results on WMT14 EN-DE, the same thing happened to me.
Hi, all. I am trying to reproduce the word-level oracle results of this paper on WMT EN-DE dataset. I train the transformer model 150k steps with #Gpus = 4, #Freq = 2, #Toks = 4096 and save the checkpoints every 5k steps. The last 5 checkpoints are averaged to obtain the final model.
However, I find it difficult to reproduce the results reported with any
decay_k
. (The parameter of gumbel noise is fixed to 0.8 as recommended in this repo.)Any suggestion to tune the parameter
decay_k
to reproduce? Thanks a lot!
Hi, similar settings with yours and decay schedule with the author's example, also fail to reproduce the improvement of word-level oracle, have you got any idea about this problem?
Hi, all. I am trying to reproduce the word-level oracle results of this paper on WMT EN-DE dataset. I train the transformer model 150k steps with #Gpus = 4, #Freq = 2, #Toks = 4096 and save the checkpoints every 5k steps. The last 5 checkpoints are averaged to obtain the final model.
However, I find it difficult to reproduce the results reported with any
decay_k
. (The parameter of gumbel noise is fixed to 0.8 as recommended in this repo.)Any suggestion to tune the parameter
decay_k
to reproduce? Thanks a lot!