Open dynamicwebpaige opened 2 years ago
Hi ! Yes this is definitely something we'll explore, since optimizing processing pipelines can be challenging and because performance is key here: we want anyone to be able to play with large-scale datasets more easily.
I think we'll start by documenting the performance of the dataset transforms we provide, and then we can have some tools to help debugging/optimizing them
Brief Overview
Downloading, saving, and preprocessing large datasets from the
datasets
library can often result in performance bottlenecks. These performance snags can be challenging to identify and to debug, especially for users who are less experienced with building deep learning experiments.Feature Request
Could we create a performance guide for using
datasets
, similar to:tf.data
APItf.data
performance with the TF ProfilerThis performance guide should detail practical options for improving performance with
datasets
, and enumerate any common best practices. It should also show how to use tools like the PyTorch Profiler or the TF Profiler to identify any performance bottlenecks (example below).Related Issues