Does simcse achieve a better result based on hard data augmentation?

princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821

MIT License

3.36k stars 507 forks source link

Closed dongrixinyu closed 2 years ago

dongrixinyu commented 2 years ago

想请问一下：当同一个样本，使用不同的 dropout 进行计算的时候，对这个样本做一个数据增强，例如同义词替换，字符顺序交换，随机增删非关键字符等等。这样得到两条样本，输入模型使用不同的 dropout 计算，再进行训练，得到的结果会不会更好一些？

gaotianyu1350 commented 2 years ago

Hi,

We also tried a lot of data augmentation in our experiments (see Table 1), and they didn't work as well compared to just using the same sentence.

github-actions[bot] commented 2 years ago

Stale issue message