princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.36k stars 507 forks source link

Does simcse achieve a better result based on hard data augmentation? #172

Closed dongrixinyu closed 2 years ago

dongrixinyu commented 2 years ago

想请问一下:当同一个样本,使用不同的 dropout 进行计算的时候,对这个样本做一个数据增强,例如同义词替换,字符顺序交换,随机增删非关键字符等等。这样得到两条样本,输入模型使用不同的 dropout 计算,再进行训练,得到的结果会不会更好一些?

gaotianyu1350 commented 2 years ago

Hi,

We also tried a lot of data augmentation in our experiments (see Table 1), and they didn't work as well compared to just using the same sentence.

github-actions[bot] commented 2 years ago

Stale issue message