Usage of persistent_workers with auto_dataloader for spawned processes

pytorch / ignite

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

https://pytorch-ignite.ai

BSD 3-Clause "New" or "Revised" License

4.53k stars 617 forks source link

Usage of persistent_workers with auto_dataloader for spawned processes #1938

Open vfdev-5 opened 3 years ago

vfdev-5 commented 3 years ago

🚀 Feature

Since v1.7.0 pytorch introduced persistent_workers argument to DataLoader: https://pytorch.org/docs/1.7.0/data.html#torch.utils.data.DataLoader This can reduce the time of dataloader creation for each epoch with native pytorch distributed config:

with persistent_workers=True, our cifar10 example gives CIFAR10-Training INFO: Engine run complete. Time taken: 00:02:55 on 2 spawned procs (~ idist.spawn).
with persistent_workers=False, the same example gives CIFAR10-Training INFO: Engine run complete. Time taken: 00:12:08.

We have to take that into account for related dist backend when using auto_dataloader.

Moh-Yakoub commented 3 years ago

@vfdev-5 Are you suggesting enabling that field for all our examples that utilize the DataLoader?

vfdev-5 commented 3 years ago

@Moh-Yakoub I'm not yet sure about what exactly we'd like to do. We have to check "horovod" and "xla" backends as well with persistent_workers=True.

sdesrozis commented 3 years ago

The performance gap is quite impressive. However, I don't understand clearly the reason why. If I understand well, the option allows to keep worker threads in dataloaders through epochs. I don't understand why it accelerates the computation.

vfdev-5 commented 3 years ago

@sdesrozis there is no acceleration, actually, in practice that the app takes a lot of time in preparing the first batch for each epoch.

sdesrozis commented 3 years ago

10s gained by destroy/allocate threads (and memory so far) for 24 epochs ? It seems a very huge cost.