Closed Yusics closed 3 years ago
Hi,
The "dropout" augmentation is just the standard dropout in Transformers, so it is applied in the supervised setting.
Thanks for your prompt reply. I should make my question more clear.
In your unsupervised settings, you augment the data by feeding the same model twice and use the augmented data to compute contrastive learning losses.
I didn't see you use the augmented data to compute contrastive losses in your supervised settings, is this correct? Thanks.
The "dropout augmentation" is simply the standard dropout in Transformers. In the unsupervised case, because we take the identical sentence as positive instance, we need to feed it twice to get two embeddings (with different dropout). In the supervised case, since the two sentences are different, we encode each of them separately (and they also experience dropout in this process).
Let me elaborate on my question better.
In your unsupervised setting, you have to use dropout augmentation to generate positive and negative pairs. I also understand that you encode each of the sentences separately. Maybe it's better to express my question this way. Let's say the original unsupervised losses are L_unsupervised (positive pairs are the same sentences) and the supervised losses are L supervised (positive pairs are from different sentences but same labels). Since L_unsupervised can increase the uniformity (because the same sentences are closer), I'm wondering why not use L_unsupervised + L_supverised in your supervised setting.
Thanks.
Oh now I got your question. Because the "uniformity" gain is from solely the negative part, and the negative part also exists in the supervised loss, so we didn't try it out.
Got it, thanks! Just curious how well if we combine these two losses together. Thanks!
First of all, thank you so much for your great work!
I have a question about the supervised SimCSE setting. From my understanding, you didn't apply dropout data augmentation in the supervised SimCSE setting. Is this correct?
If so, can I ask the reason and why do you think adding dropout data augmentation didn't help in the supervised settings?
Thanks!