princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.36k stars 507 forks source link

what is the ratio of positive pairs and not positive pairs when training #176

Closed IT-coach-666 closed 2 years ago

IT-coach-666 commented 2 years ago

what is the ratio of positive pairs and not positive pairs when training

gaotianyu1350 commented 2 years ago

Hi,

We use in-batch negatives, which means for every positive pairs all the rest sentences in the batch are negatives. So the ratio depends on the batch size.

IT-coach-666 commented 2 years ago

Thanks for your reply! And, I think that in contrastive learning, it is important to balance the ratio of positive pairs and negative pairs, and I want to know whether you do the experiment of different batch size, since your paper doesn't mention that. But at the same time, try to change the ratio of positive pairs and negative pairs by only changing the batch size seems not so sound. And I want to know if you have plan to change the ratio of positive pairs and negative pairs in a batch to show more. tks.

gaotianyu1350 commented 2 years ago

Hi,

We found that in SimCSE batch size doesn't matter that much (we tried further increasing batch sizes but no significant difference). The common belief though, when it comes to the InfoNCE loss, is that the more negative the better (because here the negatives are on the denominator so there is no need to balance).

IT-coach-666 commented 2 years ago

Tks