princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.31k stars 502 forks source link

How do you get the supervised nli dataset? #242

Closed leoozy closed 1 year ago

leoozy commented 1 year ago

Do you sampled from the SNLI+MNLI or directly use the whole dataset?

gaotianyu1350 commented 1 year ago

We directly combine SNLI + MNLI and use all data. Though some data might be filtered out because there are some that don't have a corresponding hard negative (contradiction label).

github-actions[bot] commented 1 year ago

Stale issue message