Closed stas00 closed 4 years ago
Yes, I chose checkpoint_top5_average.pt
as explained in the [paper](). For BLEU calculation, we did compound splitting, following prior work, including Vaswani et al. 2017. It typically yields a 0.5+ BLEU improvement. The compound split function is available here.
Thank you very much for sharing these details and the links, @jungokasai!
Hi @jungokasai,
I noticed on the German side, the Europarl corpus in the data you preprocessed (wmt16.en-de.deep-shallow.dist.tar.gz) is different to what the preprocessed data from the link: https://google.github.io/seq2seq/nmt/ and the original WMT shared task (this link). Actually you can easily see this when open your file and the original training-parallel-europarl-v7.tgz.
Do you know why this happen? Thanks a lot.
Best,
Sorry if I'm misunderstanding you, but it should be different because that data is the result of knowledge distillation from a transformer large model.
Could you please help me replicate the reported scores? If I follow your instructions I don't get the same scores.
What I did:
So it appears that
checkpoint_last.pt
gets the best score and notcheckpoint_best.pt
, but it's still below the advertised score.What am I doing wrong?
Thank you!