princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.37k stars 511 forks source link

How to train your model to better fit SentEval #241

Closed ZBWpro closed 1 year ago

ZBWpro commented 1 year ago

Hi~

SentEval requires users to implement a function called "batcher(params, batch)".

When it comes to STS tasks, the parameter "batch" of the batcher function only contains one sentence of a certain sentence pair.

This may cause conflicts, as your model typically requires the entire sentence pair as input.

If you convert sentences of a pair to embeddings one by one, it means you will need to call forward twice. This behavior is poorly supported by DDP.

github-actions[bot] commented 1 year ago

Stale issue message

gaotianyu1350 commented 1 year ago

Hi,

Sorry about the late replyl. I'm not sure I totally follow. SentEval is only used for eval, thus it shouldn't affect DDP.

ZBWpro commented 1 year ago

Thanks for your reply, I found that I could solve this problem by introducing an extra encoding function into the class and calling it several times in forward.