princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.36k stars 507 forks source link

Which file is actually used for the reported results in the paper? #169

Closed KawaiiNotHawaii closed 2 years ago

KawaiiNotHawaii commented 2 years ago

Hi, it seems that the evaluate function that is called after the training process doesn't report the performance for all sts tasks and transfer tasks? Did you actually use the evaluation.py in the 'test' setting to generate the reported results in the paper?

gaotianyu1350 commented 2 years ago

Hi,

You need to call evaluate.py separately to get the full numbers. The one in the training process only reports the DEV set performance.

KawaiiNotHawaii commented 2 years ago

So for sts tasks, the numbers in 'all' setting are reported, and for transfer tasks, the numbers in 'devacc' are reported? Would you please explain why not use 'acc' for transfer tasks?

gaotianyu1350 commented 2 years ago

Hi,

Are you talking about the evaluation.py or the in-training one? In in-training one it should be dev because it should only report the dev numbers. In evaluation.py you can choose to report dev or test numbers.