huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
18.6k stars 2.55k forks source link

add `with_transform` and/or `set_transform` to IterableDataset #6890

Open not-lain opened 3 weeks ago

not-lain commented 3 weeks ago

Feature request

when working with a really large dataset it would save us a lot of time (and compute resources) to use either with_transform or the set_transform from the Dataset class instead of waiting for the entire dataset to map

Motivation

don't want to wait for a really long dataset to map, this would give IterableDataset an extra advantage over the Dataset class. reducing time and resources

Your contribution

I am a little busy with my job search lately, but would post about this feature in my social media. Apologies again (dad going to kick me out soon), if I ever have some free time I will contribute to making this a reality, but that's going to be hard     / (┬┬﹏┬┬)\