libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.82k stars 178 forks source link

Cannot find how DDP DistrubutedSampler works in ffcv #122

Closed Kwentar closed 2 years ago

Kwentar commented 2 years ago

Hi, in DDP training we have DistributedSampler which has .set_epoch method (https://pytorch.org/docs/stable/data.html) It looks like:

>>> for epoch in range(start_epoch, n_epochs):
...     if is_distributed:
...         sampler.set_epoch(epoch)
...     train(loader)

How does it work here? Do we need do something for each epoch or we can just remove sampler.set_epoch while migrating to ffcv?

GuillaumeLeclerc commented 2 years ago

Hi @Kwentar

We do not require that in FFCV, just use it normally. Please refer to our imagenet example which supports distributed training