princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.33k stars 505 forks source link

Cropping data augmentation (Table 1) #231

Closed birajpandey closed 1 year ago

birajpandey commented 1 year ago

In the caption for Table 1 for Gao et. al. 2022, it states that cropping keeps 100-k% of the length. When cropping, did you keep the chopped words or round it to the nearest space?

gaotianyu1350 commented 1 year ago

Hi,

We always keep/delete the whole word (separated by spaces).