Open BaohaoLiao opened 1 year ago
Hi, I'm trying to reproduce the result of XSUM from scratch too, using recommended parameters in README. But I cannot reproduce the ROUGE score, which is much lower than the score reported in the paper. Any suggestions? @qiweizhen , thank you!
Diffusion models w/o pre-training often require more training steps. If you want to reproduce the results from scratch, you need to set the --lr_anneal_steps
to more (e.g. Xsum 400k steps). We hope this suggestion can help you.
We have noticed that our description of training from scratch in README
have caused some misunderstandings, and we will update and correct them in the next version. Thank you for your feedback.
Hi @qiweizhen,
I try to reproduce your reported result of the training from scratch for XSum. However, the decoder_nll loss is always equal to 0, which is quite weird since it's cross-entropy loss.
If I load your pre-trained model, it is not equal to 0. Do you know the reason?