As raised during a discussion, there is no point in masking padding tokens if we create them by filling with zeros (they will always yield 0 cos sim), so I removed this part in the colbert_score function.
Since during training, the padding tokens are not zeros but padding tokens processing by the models, I let the masking, but I changed it to use broadcasting instead of creating the mask explicitly to save some VRAM.
As raised during a discussion, there is no point in masking padding tokens if we create them by filling with zeros (they will always yield 0 cos sim), so I removed this part in the colbert_score function.
Since during training, the padding tokens are not zeros but padding tokens processing by the models, I let the masking, but I changed it to use broadcasting instead of creating the mask explicitly to save some VRAM.