pytorch / data

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
BSD 3-Clause "New" or "Revised" License
1.13k stars 152 forks source link

Set better defaults for `MultiProcessingReadingService` #705

Open msaroufim opened 2 years ago

msaroufim commented 2 years ago

🚀 The feature

class MultiProcessingReadingService(ReadingServiceInterface):
    num_workers: int = get_number_of_cpu_cores()
    pin_memory: bool = True
    timeout: float
    worker_init_fn: Optional[Callable[[int], None]] # Remove this?
    prefetch_factor: int = profile_optimal_prefetch_factor(model : nn.Module)
    persistent_workers: bool = True

I can add these, opening this issue to discuss whether it's a good idea to change defaults.

+: Users get better out of the box performance with torchdata -: backward compatibility issues when moving from dataloaderv1 to dataloaderv2

Motivation, pitch

There are many issues on discuss, stack overflow, and blogs describing how people should configure data loaders for optimized performance. Since a lot of the tricks haven't changed like pin_memory = true or num_workers = num_cpu_cores or persistent_workers=true and since we're in the process of developing dataloaderv2 now may be a good time to revisit these default values

Alternatives

  1. Instead of setting reasonable defaults, we can instead extend the linter.py to suggest some of these tips if we notice some sources of slowdowns
  2. Do nothing, suggest people read documentation when configuring performance

Additional context

No response

ejguan commented 2 years ago

Just a note: MultiProcessingReadingService (MPRS) is a temporary reading service. We will change it to the the PrototypeMultiProcessingReadingService. And, for prefetch_factor, we might provide it as DataPipes to let users define it in their pipeline. And, we have decided to make pin_memory as an adapt_fn. See: https://github.com/pytorch/data/pull/485

For worker_init_fn, I think we still need it for users to control the state of worker process if they have special use cases.

I like the idea by providing some reasonable number of workers by default, since it makes no sense to provide num_worker=0 to MPRS to achieve single process iteration.