graphcore / poptorch

PyTorch interface for the IPU
https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/
MIT License
176 stars 14 forks source link

Dataloader sampler support #6

Open Lime-Cakes opened 1 year ago

Lime-Cakes commented 1 year ago

Is it possible to use dataloader with a custom sample/batch_sampler? At the moment, I cannot find any useful information on using poptorch's dataloader with custom sampler. Are there plans to support it, or is custom sampler impossible due to IPU design?

Edit: At the moment, using a custom batch_sampler would results in the following error:

Traceback (most recent call last):
  File "train-ipu.py", line 488, in <module>
    main()
  File "train-ipu.py", line 446, in main
    train_dataloader = poptorch.DataLoader(opts,train_dataset,collate_fn=collate_fn, batch_sampler=box_sampler)
  File "/usr/local/lib/python3.8/dist-packages/poptorch/__init__.py", line 356, in __init__
    super().__init__(dataset,
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 251, in __init__
    raise ValueError('batch_sampler option is mutually exclusive '
ValueError: batch_sampler option is mutually exclusive with batch_size, shuffle, sampler, and drop_last
AnthonyBarbier commented 1 year ago

batch_sampler is currently not supported by the poptorch.DataLoader but you could use one with a stock torch.utils.data.DataLoader however you need to make sure each element the sampler returns matches the combined batch size expected by the PopTorch model.

Here is how the combined batch size is computed:

            self._combined_batch_size = batch_size * \
                options.device_iterations * \
                options.replication_factor * \
                options.Training.gradient_accumulation

Source: https://github.com/graphcore/poptorch/blob/sdk-release-3.0/python/__init__.py#L278