princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.39k stars 512 forks source link

Performance for purely unsupervised training #167

Closed hxu38691 closed 2 years ago

hxu38691 commented 2 years ago

Hi, I notice in the paper the unsupervised part is starting from a pretrained bert model, however, would it achieve similar results if train entirely from scratch, Thank you

gaotianyu1350 commented 2 years ago

Hi,

Yes unsupervised simcse needs to be trained from a pre-trained checkpoint. Training from scratch won't work.

github-actions[bot] commented 2 years ago

Stale issue message