Open Coolcoder45 opened 4 months ago
Hi, thank you for reporting! This is definitely a bug.
Workaround: add the following arg to your tfds.load
call:
tfds.load(..., download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.ARRAY_RECORD})
We'll look on how to update the code and update on the bug.
It's still giving error.
import tensorflow_datasets as `tfds`
plant_leaves_data, plant_leaves_info = tfds.load('plant_leaves', split='train', shuffle_files=True, download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.ARRAY_RECORD})
Gives
Downloading and preparing dataset 6.56 GiB (download: 6.56 GiB, generated: 6.81 GiB, total: 13.37 GiB) to /root/tensorflow_datasets/plant_leaves/0.1.1...
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-3-608b46b22c6c>](https://localhost:8080/#) in <cell line: 4>()
2 #plant_leaves = tfds.load('plant_leaves', split='train', shuffle_files=True)
3 #plant_leaves_data, plant_leaves_info = tfds.load('plant_leaves', split='train', shuffle_files=True, as_data_source=True)
----> 4 plant_leaves_data, plant_leaves_info = tfds.load('plant_leaves', split='train', shuffle_files=True, download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.ARRAY_RECORD})
5 frames
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/logging/__init__.py](https://localhost:8080/#) in __call__(self, function, instance, args, kwargs)
167 metadata = self._start_call()
168 try:
--> 169 return function(*args, **kwargs)
170 except Exception:
171 metadata.mark_error()
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/load.py](https://localhost:8080/#) in load(name, split, data_dir, batch_size, shuffle_files, download, as_supervised, decoders, read_config, with_info, builder_kwargs, download_and_prepare_kwargs, as_dataset_kwargs, try_gcs)
645 try_gcs,
646 )
--> 647 _download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs)
648
649 if as_dataset_kwargs is None:
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/load.py](https://localhost:8080/#) in _download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs)
504 if download:
505 download_and_prepare_kwargs = download_and_prepare_kwargs or {}
--> 506 dbuilder.download_and_prepare(**download_and_prepare_kwargs)
507
508
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/logging/__init__.py](https://localhost:8080/#) in __call__(self, function, instance, args, kwargs)
167 metadata = self._start_call()
168 try:
--> 169 return function(*args, **kwargs)
170 except Exception:
171 metadata.mark_error()
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/dataset_builder.py](https://localhost:8080/#) in download_and_prepare(self, download_dir, download_config, file_format)
679 # to generate the files.
680 if file_format:
--> 681 self.info.set_file_format(file_format, override=True)
682
683 # Create a tmp dir and rename to self.data_dir on successful exit.
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/dataset_info.py](https://localhost:8080/#) in set_file_format(self, file_format, override)
470 )
471 if override and self._fully_initialized:
--> 472 raise RuntimeError(
473 "Cannot override the file format "
474 "when the DatasetInfo is already fully initialized!"
RuntimeError: Cannot override the file format when the DatasetInfo is already fully initialized!
Same errors on refcoco dataset.
NotImplementedError: `.as_dataset()` not implemented for ArrayRecord files. Please, use `.as_data_source()`.
Anyway, one thing I do to solve this is add the following line:
builder = tfds.builder('ref_coco/refcocog_umd')
builder.info.set_file_format(tfds.core.FileFormat.PARQUET, override=True, override_if_initialized=True)
builder.download_and_prepare()
ref_ds = tfds.load('ref_coco/refcocog_umd', split='validation')
/!\ PLEASE INCLUDE THE FULL STACKTRACE AND CODE SNIPPET
Short description tfds plant_leaves is not getting loaded successfully. It's throwing NotImplementedError. Tried on May 16, 2024
Environment information
Operating System: Windows 11
Python version: 3.10.12
tensorflow-datasets
/tfds-nightly
version: 4.9.4tensorflow
/tf-nightly
version: version: 2.15.0Does the issue still exists with the last
tfds-nightly
package (pip install --upgrade tfds-nightly
) ? YupReproduction instructions
Gives:
Expected behavior To load dataset successfully.