Trying to download only the 'validation' split of my dataset; instead hit the error datasets.exceptions.ExpectedMoreSplitsError.
Appears to be the same undesired behavior as reported in #6939, but with data_files, not data_dir.
Here is the Traceback:
Traceback (most recent call last):
File "/home/user/app/app.py", line 12, in <module>
ds = load_dataset('datacomp/imagenet-1k-random0.0', token=GATED_IMAGENET, data_files={'validation': 'data/val*'}, split='validation', trust_remote_code=True)
File "/usr/local/lib/python3.10/site-packages/datasets/load.py", line 2154, in load_dataset
builder_instance.download_and_prepare(
File "/usr/local/lib/python3.10/site-packages/datasets/builder.py", line 924, in download_and_prepare
self._download_and_prepare(
File "/usr/local/lib/python3.10/site-packages/datasets/builder.py", line 1018, in _download_and_prepare
verify_splits(self.info.splits, split_dict)
File "/usr/local/lib/python3.10/site-packages/datasets/utils/info_utils.py", line 68, in verify_splits
raise ExpectedMoreSplitsError(str(set(expected_splits) - set(recorded_splits)))
datasets.exceptions.ExpectedMoreSplitsError: {'train', 'test'}
Note: I am using the data_files argument only because I am trying to specify that I only want the 'validation' split, and the whole dataset will be downloaded even when the split='validation' argument is specified, unless you also specify data_files, as described here: https://discuss.huggingface.co/t/how-can-i-download-a-specific-split-of-a-dataset/79027
Describe the bug
Trying to download only the 'validation' split of my dataset; instead hit the error
datasets.exceptions.ExpectedMoreSplitsError
. Appears to be the same undesired behavior as reported in #6939, but withdata_files
, notdata_dir
.Here is the Traceback:
Note: I am using the
data_files
argument only because I am trying to specify that I only want the 'validation' split, and the whole dataset will be downloaded even when thesplit='validation'
argument is specified, unless you also specifydata_files
, as described here: https://discuss.huggingface.co/t/how-can-i-download-a-specific-split-of-a-dataset/79027Steps to reproduce the bug
ds = load_dataset('datacomp/imagenet-1k-random0.0', token=GATED_IMAGENET, data_files={'validation': 'data/val*'}, split='validation', trust_remote_code=True)
Expected behavior
Downloading validation split.
Environment info
Default environment for creating a new Space. Relevant to this bug, that is: