Conference :
Link : https://arxiv.org/pdf/2104.08821.pdfAuthors' Affiliation : princeton, tsinghua
TL;DR : sentence embedding. 즉, cross encoder가 아니라 bi/dual encoder에서 새로운 sota를 기록한 논문.
Summary :
2 Background
Alignment and uniformity
Given a distribution of positive pairs p_pos, alignment calculates expected distance between embeddings of the paired instances
uniformity measures how well the embeddings are uniformly distributed
3 Unsupervised SimCSE
그냥 forward 한번하고 다시 한번하면 dropout이 달라지니까 그것 만으로 충분하다는 것. 다른 어떤 data augmentation도 하지 않고.
the learned embeddings occupy a narrow cone in the vector space, which largely limits their expressiveness
s. Gao et al. (2019) term it as a representation degeneration problem and demonstrate that language models trained with tied input/output embeddings lead to anisotropic word embeddings
Wang et al. (2020) show that the singular values of the word embedding matrix decay drastically. In other words, except for a few dominating singular values, all others are close to zero.
we show that the contrastive objective can inherently “flatten” the singular value distribution of the sentence-embedding matrix.
when minimizing the second term in Eq. 6, we are reducing the top eigenvalue of WW> and inherently “flattening” the singular spectrum of the embedding space. Hence contrastive learning can potentially tackle the representation degeneration problem and improve the uniformity.
-> 오히려 반대 아닌가? 논리적 비약이 심한것 같은데.
6 Experiments
왜 transfer task에는 MLM을 추가 했는데 STS에선 추가 안했지? 아무튼 SimCSE + MLM이 제일 성능이 좋음.
Conference : Link : https://arxiv.org/pdf/2104.08821.pdf Authors' Affiliation : princeton, tsinghua TL;DR : sentence embedding. 즉, cross encoder가 아니라 bi/dual encoder에서 새로운 sota를 기록한 논문.
Summary :
2 Background
Alignment and uniformity
Given a distribution of positive pairs p_pos, alignment calculates expected distance between embeddings of the paired instances
uniformity measures how well the embeddings are uniformly distributed
3 Unsupervised SimCSE
그냥 forward 한번하고 다시 한번하면 dropout이 달라지니까 그것 만으로 충분하다는 것. 다른 어떤 data augmentation도 하지 않고.
" a minimal form of data augmentation"
4 Supervised SimCSE
NLI에서 entailment를 positive로 가져오고 Contradiction을 negative로 가져와서 학습.
5 Connection to Anisotropy
anisotropy problem
the learned embeddings occupy a narrow cone in the vector space, which largely limits their expressiveness
s. Gao et al. (2019) term it as a representation degeneration problem and demonstrate that language models trained with tied input/output embeddings lead to anisotropic word embeddings
Wang et al. (2020) show that the singular values of the word embedding matrix decay drastically. In other words, except for a few dominating singular values, all others are close to zero.
we show that the contrastive objective can inherently “flatten” the singular value distribution of the sentence-embedding matrix.
when minimizing the second term in Eq. 6, we are reducing the top eigenvalue of WW> and inherently “flattening” the singular spectrum of the embedding space. Hence contrastive learning can potentially tackle the representation degeneration problem and improve the uniformity.
-> 오히려 반대 아닌가? 논리적 비약이 심한것 같은데.
6 Experiments
왜 transfer task에는 MLM을 추가 했는데 STS에선 추가 안했지? 아무튼 SimCSE + MLM이 제일 성능이 좋음.
7 Analysis
SimCSE가 uniformity나 alignment 측면에서 좋다.