Open vfdev-5 opened 3 years ago
@vfdev-5 Are you suggesting enabling that field for all our examples that utilize the DataLoader?
@Moh-Yakoub I'm not yet sure about what exactly we'd like to do. We have to check "horovod" and "xla" backends as well with persistent_workers=True
.
The performance gap is quite impressive. However, I don't understand clearly the reason why. If I understand well, the option allows to keep worker threads in dataloaders through epochs. I don't understand why it accelerates the computation.
@sdesrozis there is no acceleration, actually, in practice that the app takes a lot of time in preparing the first batch for each epoch.
10s gained by destroy/allocate threads (and memory so far) for 24 epochs ? It seems a very huge cost.
🚀 Feature
Since v1.7.0 pytorch introduced
persistent_workers
argument toDataLoader
: https://pytorch.org/docs/1.7.0/data.html#torch.utils.data.DataLoader This can reduce the time of dataloader creation for each epoch with native pytorch distributed config:persistent_workers=True
, our cifar10 example givesCIFAR10-Training INFO: Engine run complete. Time taken: 00:02:55
on 2 spawned procs (~idist.spawn
).persistent_workers=False
, the same example givesCIFAR10-Training INFO: Engine run complete. Time taken: 00:12:08
.We have to take that into account for related dist backend when using
auto_dataloader
.