Why not use the pooler and the lm head initialized from the pretrained model?

princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821

MIT License

3.36k stars 507 forks source link

Why not use the pooler and the lm head initialized from the pretrained model? #179

Closed Doragd closed 2 years ago

Doragd commented 2 years ago

Hi, I noticed there are a pooler layer and an LM head that were newly initialized. Why not use the weight from the pre-trained model to initialize them?

https://github.com/princeton-nlp/SimCSE/blob/5005c3daab99cb9f6ff92c526ab751079a169826/simcse/models.py#L281 https://github.com/princeton-nlp/SimCSE/blob/5005c3daab99cb9f6ff92c526ab751079a169826/simcse/models.py#L284

gaotianyu1350 commented 2 years ago

Hi,

The BERT pooler layer was trained with the "next sentence prediction" task, which actually doesn't help with the sentence embedding task (we observed that in our preliminary experiments). So we just re-initialize it for clean setting.

Doragd commented 2 years ago

Thanks~