Support offloading data pre-processing to auxiliary devices

czmrand commented 2 years ago

🚀 The feature, motivation and pitch

Occasionally one might find that their GPU is idle due to a bottleneck on the input data pre-processing pipeline (which might include data loading/filtering/manipulation/augmentation/etc). In these cases one could improve resource utilization by offloading some of the pre-processing to auxiliary CPU devices. I have demonstrated how to do this using gRPC in the following blog post: https://towardsdatascience.com/overcoming-ml-data-preprocessing-bottlenecks-with-grpc-ca30fdc01bee

TensorFlow has built in (experimental) support for this feature (https://www.tensorflow.org/api_docs/python/tf/data/experimental/service) that enables offloading in a few simple steps.

The request here is to include PyTorch APIs for offloading data pre-processing in a manner that would be simple and straight forward to the user... Similar to the TensorFlow APIs (though preferably without any limitations on pre-processing workload) .

Alternatives

No response

Additional context

No response

cc @SsnL @VitalyFedyunin @ejguan @NivekT

ejguan commented 2 years ago

You might want to take a look at the project called torchx in https://pytorch.org/torchx/latest/examples_apps/index.html. This might provide you the required feature.

VitalyFedyunin commented 2 years ago

Transferred this issue, as we definitely not going to implement it as a classic DataLoader feature, but looking to have it in DLv2.

pytorch / data