huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.28k stars 2.7k forks source link

Concurrent loading in `load_from_disk` - `num_proc` as a param #7286

Closed unography closed 1 week ago

unography commented 1 week ago

Feature request

https://github.com/huggingface/datasets/pull/6464 mentions a num_proc param while loading dataset from disk, but can't find that in the documentation and code anywhere

Motivation

Make loading large datasets from disk faster

Your contribution

Happy to contribute if given pointers