I want to download Common Voice 17.0 hy-AM but it returns an error.
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_name='hfds_config', config_path=None)
/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
/usr/local/lib/python3.10/dist-packages/datasets/load.py:1429: FutureWarning: The repository for mozilla-foundation/common_voice_17_0 contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/mozilla-foundation/common_voice_17_0
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
warnings.warn(
Reading metadata...: 6180it [00:00, 133224.37it/s]les/s]
Generating train split: 0 examples [00:00, ? examples/s]
HuggingFace datasets failed due to some reason (stack trace below).
For certain datasets (eg: MCV), it may be necessary to login to the huggingface-cli (via `huggingface-cli login`).
Once logged in, you need to set `use_auth_token=True` when calling this script.
Traceback error for reference :
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1743, in _prepare_split_single
example = self.info.features.encode_example(record) if self.info.features is not None else record
File "/usr/local/lib/python3.10/dist-packages/datasets/features/features.py", line 1878, in encode_example
return encode_nested_example(self, example)
File "/usr/local/lib/python3.10/dist-packages/datasets/features/features.py", line 1243, in encode_nested_example
{
File "/usr/local/lib/python3.10/dist-packages/datasets/features/features.py", line 1243, in <dictcomp>
{
File "/usr/local/lib/python3.10/dist-packages/datasets/utils/py_utils.py", line 326, in zip_dict
yield key, tuple(d[key] for d in dicts)
File "/usr/local/lib/python3.10/dist-packages/datasets/utils/py_utils.py", line 326, in <genexpr>
yield key, tuple(d[key] for d in dicts)
KeyError: 'sentence_id'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/workspace/nemo/scripts/speech_recognition/convert_hf_dataset_to_nemo.py", line 358, in main
dataset = load_dataset(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2549, in load_dataset
builder_instance.download_and_prepare(
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1005, in download_and_prepare
self._download_and_prepare(
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1767, in _download_and_prepare
super()._download_and_prepare(
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1100, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1605, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1762, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
Steps to reproduce the bug
from datasets import load_dataset
cv_17 = load_dataset("mozilla-foundation/common_voice_17_0", "hy-AM")
Describe the bug
I want to download Common Voice 17.0 hy-AM but it returns an error.
Steps to reproduce the bug
Expected behavior
It works fine with common_voice_16_1
Environment info
datasets
version: 2.18.0huggingface_hub
version: 0.22.2fsspec
version: 2024.2.0