Closed gangiswag closed 4 days ago
for weakly supervised contrastive retraining, yes we only use in batch negatives. for fine tuning we do use hard negatives. the link above is old code and instead the negatives get added alongside the documents: https://github.com/nomic-ai/contrastors/blob/main/src/contrastors/dataset/text_text_loader.py#L419
there’s a bunch of bad indirection here that should be cleaned up eventually
Ah I see, so you add them in the dataloader itself. Thanks for confirming!
I'm looking at the code and seems like the per-query negatives are never used when calculating the loss: https://github.com/nomic-ai/contrastors/blob/e326624a4fb531fad15d099d1d310547a62d275d/src/contrastors/trainers/text_text.py#L194C13-L194C29 ?
So the only negatives during contrastive loss are the in-batch negatives?