While training I am observing that the ppl score decreases upto 10000 steps but it starts increasing after that. I am not able to understand this behaviour. Do you have any idea ?
Maybe it's overfitted. But perplexity can not represent the quality of the response. Experimentally, the response with a little bit of overfitting is better than the response with the lowest ppl score.
While training I am observing that the ppl score decreases upto 10000 steps but it starts increasing after that. I am not able to understand this behaviour. Do you have any idea ?