princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.33k stars 505 forks source link

Unsup SimCSE for Different Batch Sizes #190

Closed hbin0701 closed 2 years ago

hbin0701 commented 2 years ago

Dear Author, thank you for sharing this amazing piece of work :) 👍

I have a question about the paper where it says: "We carry out grid-search of batch size ∈ {64, 128, 256, 512} and learning rate ∈ {1e-5, 3e-5, 5e-5} on STS-B development set and adopt the hyper-parameter settings in Table A.1. We find that SimCSE is not sensitive to batch sizes as long as tuning the learning rates accordingly, which contradicts the finding that contrastive learning requires large batch sizes."

The part "as long as tuning the learning rates accordingly." concerns me. For batch size of 512, what learning rate would you recommend (for unsupervised SimCSE)? It seems like it is not that easy to figure it out 😢 In your experiments, what learning rates did you use for different batch sizes to confirm such insensitivity?

Thank you in advance! :)

gaotianyu1350 commented 2 years ago

Hi,

We simply tried all combinations of (64, 128, 256, 512) x (1e-5, 3e-5, 5e-5) and picked the best. We provided the combinations we used in Appendix A of our paper.

hbin0701 commented 2 years ago

Thank you! :)