huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.29k stars 2.7k forks source link

shard_data_sources() got an unexpected keyword argument 'worker_id' #7187

Open Qinghao-Hu opened 1 month ago

Qinghao-Hu commented 1 month ago

Describe the bug

[rank0]:   File "/home/qinghao/miniconda3/envs/doremi/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 238, in __iter__
[rank0]:     for key_example in islice(self.generate_examples_fn(**gen_kwags), shard_example_idx_start, None):
[rank0]:   File "/home/qinghao/miniconda3/envs/doremi/lib/python3.10/site-packages/datasets/packaged_modules/generator/generator.py", line 32, in _generate_examples
[rank0]:     for idx, ex in enumerate(self.config.generator(**gen_kwargs)):
[rank0]:   File "/home/qinghao/workdir/doremi/doremi/dataloader.py", line 337, in take_data_generator
[rank0]:     for ex in ds:
[rank0]:   File "/home/qinghao/miniconda3/envs/doremi/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 1791, in __iter__
[rank0]:     yield from self._iter_pytorch()
[rank0]:   File "/home/qinghao/miniconda3/envs/doremi/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 1704, in _iter_pytorch
[rank0]:     ex_iterable = ex_iterable.shard_data_sources(worker_id=worker_info.id, num_workers=worker_info.num_workers)
[rank0]: TypeError: UpdatableRandomlyCyclingMultiSourcesExamplesIterable.shard_data_sources() got an unexpected keyword argument 'worker_id'

Steps to reproduce the bug

IterableDataset cannot use

Expected behavior

can work on datasets==2.10, but will raise error for later versions.

Environment info

datasets==3.0.1