richardbaihe / paperreading

NLP papers
MIT License
2 stars 0 forks source link

Arxiv 2021 | SimCSE: Simple Contrastive Learning of Sentence Embeddings #68

Closed richardbaihe closed 3 years ago

richardbaihe commented 3 years ago

Overview & Methodology

image

This paper proposes a SimCSE framework for sentence representation learning (STS tasks), which is shown in the figure above. What the SimCSE does has been described clearly in the caption.

Table 1 shows the improvements of SimCSE in both unsupervised and supervised STS evaluations.

image

Experiment Details

The data used for unsupervised training are 10^6 sentences randomly drawn from English Wikipedia.

As different drop-out masks sound like data augmentation, this paper also compares popular DA methods on this task in Table 2. However, all of them hurt the performance.

image

By comparing different training objectives, we can find that simCSE is the best.

image

Final results:

image