huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.29k stars 2.7k forks source link

Align filename prefix splitting with WebDataset library #7151

Closed albertvillanova closed 2 months ago

albertvillanova commented 2 months ago

Align filename prefix splitting with WebDataset library.

This PR uses the same base_plus_ext function as the one used by the webdataset library.

Fix #7150.

Related to #7144.