Open Precola opened 3 months ago
When I want to compute the perplexity of GPT2, how many epoches is suitable for training the GPT2 model?
When the model is trained for 400 epoches, the perplexity is about 35. Will the perplexity go down after taking more epoches?
When I want to compute the perplexity of GPT2, how many epoches is suitable for training the GPT2 model?
When the model is trained for 400 epoches, the perplexity is about 35. Will the perplexity go down after taking more epoches?