Same batch in distributed backends

OleguerCanal commented 2 years ago

Hey guys! I might be mistaken but I think the way the samplers are implemented, if using a distributed backend (such as ddp, ddp-sharded), samples the same examples for all the accelerators (gpus).

Instead of inheriting from a torch.utils.data.Sampler I suggest to inherit from a torch.utils.data.distributed.DistributedSampler and partition the data across accelerators doing something like this:

class RandomSampler(DistributedSampler):
    r"""
    Implementation of a Random Sampler for sampling the dataset.

    Args:
        data_source (torch.utils.data.Dataset): dataset to sample from
        batch_size (int): size of batch
        drop_last (bool): flat indication whether to drop last batch or not
    """
    def __init__(self, data_source, batch_size: int = 32, drop_last: bool = True) -> None:
        super(RandomSampler, self).__init__(data_source, drop_last=drop_last)
        self.data_source = data_source
        self.batch_size = batch_size
        ids = list(range(0, len(data_source)))
        start = int(len(data_source)*self.rank/self.num_replicas)
        end = int(len(data_source)*(self.rank + 1)/self.num_replicas)
        self.bins = [ids[i:i + batch_size] for i in range(start, end, batch_size)]
        self.drop_last = drop_last

    def __iter__(self):
        for ids in self.bins:
            yield ids

    def __len__(self):
        return len(self.bins)

upskyy commented 2 years ago

Thanks for suggesting great ideas. Can you open the PR with the test code?

OleguerCanal commented 2 years ago

Sure! Will do when I have a bit of time 🧐

openspeech-team / openspeech

Same batch in distributed backends #149