microsoft / ProphetNet

A research project for natural language generation, containing the official implementations by MSRA NLC team.
MIT License
686 stars 109 forks source link

GENIE: decoder_nll loss is always equal to 0 for training from scratch #58

Open BaohaoLiao opened 1 year ago

BaohaoLiao commented 1 year ago

Hi @qiweizhen,

I try to reproduce your reported result of the training from scratch for XSum. However, the decoder_nll loss is always equal to 0, which is quite weird since it's cross-entropy loss.

If I load your pre-trained model, it is not equal to 0. Do you know the reason?

LittlePeaPea commented 1 year ago

Hi, I'm trying to reproduce the result of XSUM from scratch too, using recommended parameters in README. But I cannot reproduce the ROUGE score, which is much lower than the score reported in the paper. Any suggestions? @qiweizhen , thank you!

lzh0525 commented 1 year ago

Diffusion models w/o pre-training often require more training steps. If you want to reproduce the results from scratch, you need to set the --lr_anneal_steps to more (e.g. Xsum 400k steps). We hope this suggestion can help you. We have noticed that our description of training from scratch in README have caused some misunderstandings, and we will update and correct them in the next version. Thank you for your feedback.