huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.94k stars 26.53k forks source link

frequent checkpoints have worse performance #6136

Closed wyin-Salesforce closed 3 years ago

wyin-Salesforce commented 4 years ago

❓ Questions & Help

Hi All, I often notice an issue when training a model and evaluate on dev set. Usually we may evaluate on the dev after each epoch, let's call this as setting A; But we often want to check the system more often on the dev set, so we may evaluate for example 1/5 epoch; let's call this as setting B.

What I noticed is that A and B will get totally different performance in the end. Since B checks more often, I supposed that B can get the same or at least very close performance with A then evaluate at 5/5 of the training set, 10/5 of the training set, etc. But they are very different. For example, when I train textual entailment model on RTE dataset, A can give me about 86% accuracy on dev, but B can only give about 80%.

What's the issue here? thanks

Details

A link to original question on the forum/Stack Overflow:

mithunpaul08 commented 4 years ago

hi, can you post the link on stackoverflow

Btw, I also face this issue when working with an RTE dataset and have raised an issue here..https://github.com/huggingface/transformers/issues/5863. My dev values after each epoch don't match up when the total number of epoch changes. Now its making me wonder if its RTE specific.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.