princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.36k stars 507 forks source link

question about "cls_before_pooler" option #159

Closed ddobokki closed 2 years ago

ddobokki commented 2 years ago

hello! i have some question

in README.md, cls_before_pooler arguments is expained

but in "run_unsup_example.sh", it take "cls" option

is it just mistake? or used in experiment?

in simple experiment using korean PLM, take cls_before_pooler option get more good scores

and thank you!

gaotianyu1350 commented 2 years ago

Hi,

In the unsupervised setting, we train the model with the CLS representation and evaluate the model with the representation before CLS. It is found to the best on English STS. However the gap is not very large and it could vary on different datasets or languages.

ddobokki commented 2 years ago

thank you!! I got it! Thanks for the good explanation! 👍