Dropout / unsupervised training implementation ?

princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821

MIT License

3.33k stars 505 forks source link

Hello, 👋

First thank you for the very good quality of your code and the support you gave to issues / PR. For a research projet, I am fine-tuning models on a contrastive objective. I intend to use your unsupervised method.

Looking at the code of SimCSE, I noticed that in the forward pass all sequences $x$ and $x^+$ are passed through the same batch. But doing so will apply the same dropout mask to all of them. My workaround is to perform two forward passes through the BERT encoder, but I still wondered, did you do the same or maybe am I missing something ?

princeton-nlp / SimCSE

Dropout / unsupervised training implementation ? #229