microsoft / MASS

MASS: Masked Sequence to Sequence Pre-training for Language Generation
https://arxiv.org/pdf/1905.02450.pdf
Other
1.12k stars 206 forks source link

Why does the valid_ppl become larger as the training progresses? #115

Open jx1100370217 opened 4 years ago

jx1100370217 commented 4 years ago

Why does the valid_ppl become larger as the training progresses?