Set better defaults for `MultiProcessingReadingService`

🚀 The feature

class MultiProcessingReadingService(ReadingServiceInterface):
    num_workers: int = get_number_of_cpu_cores()
    pin_memory: bool = True
    timeout: float
    worker_init_fn: Optional[Callable[[int], None]] # Remove this?
    prefetch_factor: int = profile_optimal_prefetch_factor(model : nn.Module)
    persistent_workers: bool = True

I can add these, opening this issue to discuss whether it's a good idea to change defaults.

+: Users get better out of the box performance with torchdata -: backward compatibility issues when moving from dataloaderv1 to dataloaderv2

Motivation, pitch

There are many issues on discuss, stack overflow, and blogs describing how people should configure data loaders for optimized performance. Since a lot of the tricks haven't changed like pin_memory = true or num_workers = num_cpu_cores or persistent_workers=true and since we're in the process of developing dataloaderv2 now may be a good time to revisit these default values

Alternatives

Instead of setting reasonable defaults, we can instead extend the linter.py to suggest some of these tips if we notice some sources of slowdowns
Do nothing, suggest people read documentation when configuring performance

Additional context

No response

pytorch / data