interleave_dataset and RandomlyCyclingMultiSourcesExamplesIterable enable us to sample data examples from different sources. But can we also sample batches in a similar manner (each batch only contains data from a single source)?
Motivation
Some recent research [1, 2] shows that source homogenous batching can be helpful for contrastive learning. Can we add a function called RandomlyCyclingMultiSourcesBatchesIterable to support this functionality?
Your contribution
I can contribute a PR. But I wonder what the best way is to test its correctness and robustness.
Feature request
interleave_dataset and RandomlyCyclingMultiSourcesExamplesIterable enable us to sample data examples from different sources. But can we also sample batches in a similar manner (each batch only contains data from a single source)?
Motivation
Some recent research [1, 2] shows that source homogenous batching can be helpful for contrastive learning. Can we add a function called
RandomlyCyclingMultiSourcesBatchesIterable
to support this functionality?Your contribution
I can contribute a PR. But I wonder what the best way is to test its correctness and robustness.