princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.33k stars 505 forks source link

what are the negative samples if removing the hard negative? (train with supervised verison) #221

Closed Rachel-Yeah-Lee closed 1 year ago

Rachel-Yeah-Lee commented 1 year ago

Hi, I follow your supervised example, but remove the hard negative from the dataset, In this may, where does the negative sample come from? Does it take the other 'sent0's or 'sent1's in the same batch as the negative sample? Would you give me some instructions? thanks a lot!

wuxiangli91 commented 1 year ago

Hi, I follow your supervised example, but remove the hard negative from the dataset, In this may, where does the negative sample come from? Does it take the other 'sent0's or 'sent1's in the same batch as the negative sample? Would you give me some instructions? thanks a lot!

I think the the negative samples are from other examples in the batch , ie In-batch-negative

Rachel-Yeah-Lee commented 1 year ago

Thanks for @wuxiangli91's reply! Do you mean that if I have a batch, size = 64, it will have 63 negative samples for each input sentence? or do you thank it will have 126 negative samples? Since there are 64 sentence-pairs in the batch.

gaotianyu1350 commented 1 year ago

Hi, in our current implementation, there will be 63 negatives.