Open msaroufim opened 2 years ago
Just a note: MultiProcessingReadingService
(MPRS
) is a temporary reading service. We will change it to the the PrototypeMultiProcessingReadingService
. And, for prefetch_factor
, we might provide it as DataPipes
to let users define it in their pipeline. And, we have decided to make pin_memory
as an adapt_fn. See: https://github.com/pytorch/data/pull/485
For worker_init_fn
, I think we still need it for users to control the state of worker process if they have special use cases.
I like the idea by providing some reasonable number of workers by default, since it makes no sense to provide num_worker=0
to MPRS
to achieve single process iteration.
🚀 The feature
I can add these, opening this issue to discuss whether it's a good idea to change defaults.
+: Users get better out of the box performance with
torchdata
-: backward compatibility issues when moving fromdataloaderv1
todataloaderv2
Motivation, pitch
There are many issues on discuss, stack overflow, and blogs describing how people should configure data loaders for optimized performance. Since a lot of the tricks haven't changed like
pin_memory = true
ornum_workers = num_cpu_cores
orpersistent_workers=true
and since we're in the process of developingdataloaderv2
now may be a good time to revisit these default valuesAlternatives
linter.py
to suggest some of these tips if we notice some sources of slowdownsAdditional context
No response