lightonai / pylate

Late Interaction Models Training & Retrieval
https://lightonai.github.io/pylate/
MIT License
175 stars 7 forks source link

Support for in-batch negative loss #56

Closed dinhvietcuong closed 2 months ago

dinhvietcuong commented 2 months ago

Hi,

First of all, thank you for developing this super useful repo!

I saw Colbert v2 uses in-batch negative loss in their training, which is equivalent to MultipleNegativeRanking loss in sentence_transformers I think https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/losses/MultipleNegativesRankingLoss.py.

May I ask if you have plan to support this loss as well in the coming time (next to the default contrastive loss) ?

Best regards, Cuong

NohTow commented 2 months ago

Hello,

You are right to note that ColBERTv2, in addition to distillation, leverages in-batch negatives (IBN). However, in the recent work from Benjamin Clavié, IBN has been shown to not be helpful when coupled to distillation. This makes sense as, when doing contrastive learning, you use large batch size to hopefully encounter hard negatives, but with distillation you already have a pool of challenging documents that you mined before the training. Besides, the learning signal is more granular as you are using the score of a cross-encoder, meaning that instead of potentially having false positives/negatives, you have a stronger/granular signal of positiveness/negativeness.

Thus, given that it is not helpful and is actually expensive (computing MaxSim across a whole batch is costly, unlike a single and simple dot product in dense models), we choose to only implement distillation alone. If you want to do IBN, you can use the contrastive loss we defined, which is also leveraging the other documents in the batch as negatives (in addition to the sample's negative) and is thus very similar to the function you linked.

And if you really want to use IBN+distillation, you can easily merge the two code into you own loss function and use it for training, that's the benefit of having a modular architecture!

dinhvietcuong commented 2 months ago

Hi NohTow,

Thanks for the quick response and clear explanation. It seems the pylate's contrastive loss you defined is what I am looking for indeed.

I was confused with the contrastive loss defined in sbert (https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/losses/ContrastiveLoss.py) in which IBN is not included in this loss. All clear to me now!