princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.37k stars 511 forks source link

 Why add two sentences in prepare_features? #263

Closed FinalFlowers closed 10 months ago

FinalFlowers commented 10 months ago

https://github.com/princeton-nlp/SimCSE/blob/main/train.py#L419C13-L419C13 If add sent0 and sent1 together, then max_length=32 will produce more truncation in the following sentences.

gaotianyu1350 commented 10 months ago

Hi, here examples[sent0_cname] and examples[sent1_cname] are both lists, so it is a concatenation of two lists instead of two strings.

FinalFlowers commented 10 months ago

Hi, here examples[sent0_cname] and examples[sent1_cname] are both lists, so it is a concatenation of two lists instead of two strings.

Get~ Thank you~