Conference : Link : https://arxiv.org/pdf/2104.08821.pdf Authors' Affiliation : princeton, tsinghua TL;DR : sentence embedding. 즉, cross encoder가 아니라 bi/dual encoder에서 새로운 sota를 기록한 논문.

Summary :

2 Background

Alignment and uniformity

Given a distribution of positive pairs p_pos, alignment calculates expected distance between embeddings of the paired instances

uniformity measures how well the embeddings are uniformly distributed

3 Unsupervised SimCSE

그냥 forward 한번하고 다시 한번하면 dropout이 달라지니까 그것 만으로 충분하다는 것. 다른 어떤 data augmentation도 하지 않고.

" a minimal form of data augmentation"

4 Supervised SimCSE

NLI에서 entailment를 positive로 가져오고 Contradiction을 negative로 가져와서 학습.

5 Connection to Anisotropy

anisotropy problem

the learned embeddings occupy a narrow cone in the vector space, which largely limits their expressiveness

s. Gao et al. (2019) term it as a representation degeneration problem and demonstrate that language models trained with tied input/output embeddings lead to anisotropic word embeddings

Wang et al. (2020) show that the singular values of the word embedding matrix decay drastically. In other words, except for a few dominating singular values, all others are close to zero.

we show that the contrastive objective can inherently “flatten” the singular value distribution of the sentence-embedding matrix.

when minimizing the second term in Eq. 6, we are reducing the top eigenvalue of WW> and inherently “flattening” the singular spectrum of the embedding space. Hence contrastive learning can potentially tackle the representation degeneration problem and improve the uniformity.

-> 오히려 반대 아닌가? 논리적 비약이 심한것 같은데.

6 Experiments

왜 transfer task에는 MLM을 추가 했는데 STS에선 추가 안했지? 아무튼 SimCSE + MLM이 제일 성능이 좋음.

7 Analysis

SimCSE가 uniformity나 alignment 측면에서 좋다.

pocca2048 / ML-paper-reading

SimCSE: Simple Contrastive Learning of Sentence Embeddings #10