First thank you for the very good quality of your code and the support you gave to issues / PR.
For a research projet, I am fine-tuning models on a contrastive objective. I intend to use your unsupervised method.
Looking at the code of SimCSE, I noticed that in the forward pass all sequences $x$ and $x^+$ are passed through the same batch. But doing so will apply the same dropout mask to all of them.
My workaround is to perform two forward passes through the BERT encoder, but I still wondered, did you do the same or maybe am I missing something ?
hello, i have the same confusion as you.
In the forward pass all sequences and are passed through the same batch. May it mean sentences pair get the same hidden states. Looking froward to your reply. Thank you.
Hello, 👋
First thank you for the very good quality of your code and the support you gave to issues / PR. For a research projet, I am fine-tuning models on a contrastive objective. I intend to use your unsupervised method.
Looking at the code of SimCSE, I noticed that in the forward pass all sequences $x$ and $x^+$ are passed through the same batch. But doing so will apply the same dropout mask to all of them. My workaround is to perform two forward passes through the BERT encoder, but I still wondered, did you do the same or maybe am I missing something ?