There is a bug when using load_dataset with dataset version at 3.0.1 .
Please see below in the "steps to reproduce the bug".
To resolve the bug, I had to downgrade to version 2.21.0
OS: Ubuntu 24 (AWS instance)
Python: same bug under 3.12 and 3.10
The error I had was:
Traceback (most recent call last):
File "", line 1, in
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/load.py", line 2096, in load_dataset
builder_instance.download_and_prepare(
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/builder.py", line 924, in download_and_prepare
self._download_and_prepare(
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/builder.py", line 1647, in _download_and_prepare
super()._download_and_prepare(
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/builder.py", line 977, in _download_and_prepare
split_generators = self._split_generators(dl_manager, split_generators_kwargs)
File "/home/ubuntu/.cache/huggingface/modules/datasets_modules/datasets/mozilla-foundation--common_voice_6_0/cb17afd34f5799f97e8f48398748f83006335b702bd785f9880797838d541b81/common_voice_6_0.py", line 159, in _split_generators
archive_path = dl_manager.download(self._get_bundle_url(self.config.name, bundle_url_template))
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/download/download_manager.py", line 150, in download
download_config = self.download_config.copy()
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/download/download_config.py", line 73, in copy
return self.class({k: copy.deepcopy(v) for k, v in self.dict.items()})
TypeError: DownloadConfig.init() got an unexpected keyword argument 'ignore_url_params'
Steps to reproduce the bug
install dataset with pip install datasets --upgrade
launch python; from datasets import loaad_dataset
run load_dataset("mozilla-foundation/common_voice_6_0")
exit python
uninstall datasets; then pip install datasets==2.21.0
launch python; from datasets import loaad_dataset
run load_dataset("mozilla-foundation/common_voice_6_0")
Everything runs great now
Expected behavior
Be able to download a dataset without error
Environment info
Copy-and-paste the text below in your GitHub issue.
Describe the bug
There is a bug when using load_dataset with dataset version at 3.0.1 . Please see below in the "steps to reproduce the bug". To resolve the bug, I had to downgrade to version 2.21.0 OS: Ubuntu 24 (AWS instance) Python: same bug under 3.12 and 3.10
The error I had was: Traceback (most recent call last): File "", line 1, in
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/load.py", line 2096, in load_dataset
builder_instance.download_and_prepare(
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/builder.py", line 924, in download_and_prepare
self._download_and_prepare(
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/builder.py", line 1647, in _download_and_prepare
super()._download_and_prepare(
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/builder.py", line 977, in _download_and_prepare
split_generators = self._split_generators(dl_manager, split_generators_kwargs)
File "/home/ubuntu/.cache/huggingface/modules/datasets_modules/datasets/mozilla-foundation--common_voice_6_0/cb17afd34f5799f97e8f48398748f83006335b702bd785f9880797838d541b81/common_voice_6_0.py", line 159, in _split_generators
archive_path = dl_manager.download(self._get_bundle_url(self.config.name, bundle_url_template))
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/download/download_manager.py", line 150, in download
download_config = self.download_config.copy()
File "/home/ubuntu/miniconda3/envs/maxence_env/lib/python3.10/site-packages/datasets/download/download_config.py", line 73, in copy
return self.class({k: copy.deepcopy(v) for k, v in self.dict.items()})
TypeError: DownloadConfig.init() got an unexpected keyword argument 'ignore_url_params'
Steps to reproduce the bug
pip install datasets --upgrade
pip install datasets==2.21.0
Expected behavior
Be able to download a dataset without error
Environment info
Copy-and-paste the text below in your GitHub issue.
datasets
version: 3.0.1huggingface_hub
version: 0.26.0fsspec
version: 2024.6.1