princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.36k stars 507 forks source link

Model from model selection may get overwritten accidentially #146

Closed TJKlein closed 2 years ago

TJKlein commented 2 years ago

If one is not careful and forgets to set argument

---load_best_model_at_end

the model obtained by model selection will be overwritten by 'current' model, directly after the training step

https://github.com/princeton-nlp/SimCSE/blob/30b08875a39d0e89d71f17c57bd0dcc18e7c2f15/train.py#L550

As saving the model after the training is just there to store the tokenizer, I recommend moving it up before the train step:

https://github.com/princeton-nlp/SimCSE/blob/30b08875a39d0e89d71f17c57bd0dcc18e7c2f15/train.py#L549