huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
128.29k stars 25.45k forks source link

New `save_strategy` option called "best" to save when a new best performance is achieved. #31626

Open seanswyi opened 2 days ago

seanswyi commented 2 days ago

Feature request

Introduce a new option for the save_strategy argument called "best" which would save the model once a new best performance is achieved.

Motivation

The save_strategy argument was first introduced a few years ago in https://github.com/huggingface/transformers/pull/10286. Currently the supported options are "no", "epoch", and "steps". I'm assuming that this is to match the IntervalStrategy that's used by evaluation as well.

Judging by a conversation on a HuggingFace Discussion Forum topic, the best model is always kept by default and therefore if saving occurs at any time during the process then the best model is saved (ref: https://discuss.huggingface.co/t/save-only-best-model-in-trainer/8442). If the user deems that saving often is too burdensome then they may set save_strategy = "no" which would save the best model at the end of training.

I believe that introducing some flexibility for saving would be beneficial so that users don't have to perform saving so often but also don't have to wait until the end for a checkpoint.

Your contribution

If this feature is deemed worth it by the core maintainers then I'd be willing to take this on myself and open a PR. There are some aspects that I believe might warrant further discussion (e.g., which metric should be used to determine "best," how to override IntervalStrategy, etc.).

amyeroberts commented 2 days ago

cc @muellerzr @SunMarc