princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.37k stars 509 forks source link

Questions about the supervised SimCSE settings #86

Closed Yusics closed 3 years ago

Yusics commented 3 years ago

First of all, thank you so much for your great work!

I have a question about the supervised SimCSE setting. From my understanding, you didn't apply dropout data augmentation in the supervised SimCSE setting. Is this correct?

If so, can I ask the reason and why do you think adding dropout data augmentation didn't help in the supervised settings?

Thanks!

gaotianyu1350 commented 3 years ago

Hi,

The "dropout" augmentation is just the standard dropout in Transformers, so it is applied in the supervised setting.

Yusics commented 3 years ago

Thanks for your prompt reply. I should make my question more clear.

In your unsupervised settings, you augment the data by feeding the same model twice and use the augmented data to compute contrastive learning losses.

I didn't see you use the augmented data to compute contrastive losses in your supervised settings, is this correct? Thanks.

gaotianyu1350 commented 3 years ago

The "dropout augmentation" is simply the standard dropout in Transformers. In the unsupervised case, because we take the identical sentence as positive instance, we need to feed it twice to get two embeddings (with different dropout). In the supervised case, since the two sentences are different, we encode each of them separately (and they also experience dropout in this process).

Yusics commented 3 years ago

Let me elaborate on my question better.

In your unsupervised setting, you have to use dropout augmentation to generate positive and negative pairs. I also understand that you encode each of the sentences separately. Maybe it's better to express my question this way. Let's say the original unsupervised losses are L_unsupervised (positive pairs are the same sentences) and the supervised losses are L supervised (positive pairs are from different sentences but same labels). Since L_unsupervised can increase the uniformity (because the same sentences are closer), I'm wondering why not use L_unsupervised + L_supverised in your supervised setting.

Thanks.

gaotianyu1350 commented 3 years ago

Oh now I got your question. Because the "uniformity" gain is from solely the negative part, and the negative part also exists in the supervised loss, so we didn't try it out.

Yusics commented 3 years ago

Got it, thanks! Just curious how well if we combine these two losses together. Thanks!