huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.19k stars 2.68k forks source link

Error while following docs to load the `ted_talks_iwslt` dataset #2059

Closed ekdnam closed 3 years ago

ekdnam commented 3 years ago

I am currently trying to load the ted_talks_iwslt dataset into google colab.

The docs mention the following way of doing so.

dataset = load_dataset("ted_talks_iwslt", language_pair=("it", "pl"), year="2014")

Executing it results in the error attached below.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-7dcc67154ef9> in <module>()
----> 1 dataset = load_dataset("ted_talks_iwslt", language_pair=("it", "pl"), year="2014")

4 frames
/usr/local/lib/python3.7/dist-packages/datasets/load.py in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, ignore_verifications, keep_in_memory, save_infos, script_version, use_auth_token, **config_kwargs)
    730         hash=hash,
    731         features=features,
--> 732         **config_kwargs,
    733     )
    734 

/usr/local/lib/python3.7/dist-packages/datasets/builder.py in __init__(self, writer_batch_size, *args, **kwargs)
    927 
    928     def __init__(self, *args, writer_batch_size=None, **kwargs):
--> 929         super(GeneratorBasedBuilder, self).__init__(*args, **kwargs)
    930         # Batch size used by the ArrowWriter
    931         # It defines the number of samples that are kept in memory before writing them

/usr/local/lib/python3.7/dist-packages/datasets/builder.py in __init__(self, cache_dir, name, hash, features, **config_kwargs)
    241             name,
    242             custom_features=features,
--> 243             **config_kwargs,
    244         )
    245 

/usr/local/lib/python3.7/dist-packages/datasets/builder.py in _create_builder_config(self, name, custom_features, **config_kwargs)
    337             if "version" not in config_kwargs and hasattr(self, "VERSION") and self.VERSION:
    338                 config_kwargs["version"] = self.VERSION
--> 339             builder_config = self.BUILDER_CONFIG_CLASS(**config_kwargs)
    340 
    341         # otherwise use the config_kwargs to overwrite the attributes

/root/.cache/huggingface/modules/datasets_modules/datasets/ted_talks_iwslt/024d06b1376b361e59245c5878ab8acf9a7576d765f2d0077f61751158e60914/ted_talks_iwslt.py in __init__(self, language_pair, year, **kwargs)
    219             description=description,
    220             version=datasets.Version("1.1.0", ""),
--> 221             **kwargs,
    222         )
    223 

TypeError: __init__() got multiple values for keyword argument 'version'

How to resolve this?

PS: Thanks a lot @huggingface team for creating this great library!

ekdnam commented 3 years ago

@skyprince999 as you authored the PR for this dataset, any comments?

lhoestq commented 3 years ago

This has been fixed in #2064 by @mariosasko (thanks again !)

The fix is available on the master branch and we'll do a new release very soon :)