huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.02k stars 26.3k forks source link

`dataloader_prefetch_factor` is left unused for datasets of type `IterableDataset` #32169

Open jgreer013 opened 1 month ago

jgreer013 commented 1 month ago

https://github.com/huggingface/transformers/blob/c85510f958e6955d88ea1bafb4f320074bfbd0c1/src/transformers/trainer.py#L908

amyeroberts commented 1 month ago

cc @muellerzr @SunMarc

SunMarc commented 1 month ago

Hi @jgreer013, thanks for reporting ! Could you try to remove that line and test if this works correctly with IterableDatasets, see that there is indeed a speedup or at least the prefetching works correctly ? Thanks !

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

SunMarc commented 3 weeks ago

Hi @jgreer013, did you have the time to check ?