How to implement "fill the entire batch with samples from that single source"

nomic-ai / contrastors

Train Models Contrastively in Pytorch

Apache License 2.0

512 stars 37 forks source link

Hi there, thanks for sharing this great repo!

From your paper, I notice a paragraph says

"During training, we sample pairs from one data source at a time and fill the entire batch with samples from that single source to discourage the model from learning source-specific shortcuts."

However, by reading src/contrastors/dataset/torch_loader.py, I did not find a corresponding setting. I am just wondering if I missed anything. Could you help me go through (or point out) the script to implement this batching strategy? Thanks a lot!

nomic-ai / contrastors

How to implement "fill the entire batch with samples from that single source" #39