lancopku / Prime

A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.
Other
87 stars 9 forks source link

Reproducing IWSLT14-de-en results #8

Closed dguo98 closed 4 years ago

dguo98 commented 4 years ago

Hi there, Thanks so much for the great work! I'm currently trying to reproduce IWSLT14-de-en (Prime model) results on a single P100 GPU. I follow the exact script at https://github.com/lancopku/Prime/blob/master/examples/parallel_intersected_multi-scale_attention(Prime)/README.md. However, I'm unable to reproduce the results. It gave me 100+ perplexity after training is finished, and the BLEU score is below 30.

Do you have any suggestions? What is the expected perplexity / curve?

zhaoguangxiang commented 4 years ago

Thank you for your interest. I recently uploaded the training and evaluation log for IWSLT14 De-En to help you check the reproducing process. The expected valid ppl is 4.6+. Latest version of this github reposity gets 4.6+ valid ppl, but the BLEU score is not always the same, we will list the origin environment setting later

dguo98 commented 4 years ago

Thanks!! I'll try it out! @zhaoguangxiang

For WMT14, How did you use "compound splitting" exactly?

zhaoguangxiang commented 4 years ago

Thanks!! I'll try it out! @zhaoguangxiang

For WMT14, How did you use "compound splitting" exactly?

yes, compound splitting for wmt14 ende