tensorflow / datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
https://www.tensorflow.org/datasets
Apache License 2.0
4.28k stars 1.53k forks source link

tensorflow_datasets v4.9.4 introduces bug that prevents loading datasets #5203

Open kpertsch opened 9 months ago

kpertsch commented 9 months ago

Short description When upgrading to the most recent tensorflow_datasets==4.9.4 I am getting errors for loading datasets (from the official TFDS catalogue). I have verified that the same datasets can load in version 4.9.3 without problem.

Environment information

Reproduction instructions

import tensorflow_datasets as tfds
ds = tfds.load("fractal20220817_data", data_dir="gs://gresearch/robotics")

OR colab: https://colab.research.google.com/drive/1neCJ3_TnF1tqr8qv4FM5__-v4SwVOOxJ?usp=sharing

Link to logs

FileNotFoundError                         Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/dataset_info.py](https://localhost:8080/#) in read_from_json(path)
   1033   try:
-> 1034     json_str = epath.Path(path).read_text()
   1035   except OSError as e:

27 frames
FileNotFoundError: [Errno 2] No such file or directory: 'fractal20220817_data/0.1.0/dataset_info.json'

The above exception was the direct cause of the following exception:

FileNotFoundError                         Traceback (most recent call last)
FileNotFoundError: Could not load dataset info from fractal20220817_data/0.1.0/dataset_info.json

The above exception was the direct cause of the following exception:

FileNotFoundError                         Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/utils/py_utils.py](https://localhost:8080/#) in reraise(e, prefix, suffix)
    383     else:
    384       exception = RuntimeError(f'{type(e).__name__}: {msg}')
--> 385     raise exception from e
    386   # Otherwise, modify the exception in-place
    387   elif len(e.args) <= 1:

FileNotFoundError: Failed to construct dataset "fractal20220817_data", builder_kwargs "{'data_dir': 'gs://gresearch/robotics'}": Could not load dataset info from fractal20220817_data/0.1.0/dataset_info.json

Additional context Interestingly, constructing a builder_from_directory still seems to work even in the most recent tfds version. builder = tfds.builder_from_directory("gs://gresearch/robotics/fractal20220817_data/0.1.0")

tomvdw commented 9 months ago

Thanks for your detailed bug report!

This is caused by that _GCS_BUCKET was made empty in this commit: https://github.com/tensorflow/datasets/commit/b78fc27c4f830c590c28002b1a1d07ef14e588dc

I'll contact the people who changed it, but with the holidays I don't know how quickly they'll respond.

In the meantime you can also load it by specifying the version:

ds = tfds.load("fractal20220817_data:0.1.0", data_dir="gs://gresearch/robotics")
tomvdw commented 8 months ago

A fix was submitted. Could you test with tfds nightly if it now works?

Ericodencoder commented 8 months ago

Thanks for your detailed bug report!

This is caused by that _GCS_BUCKET was made empty in this commit: b78fc27

I'll contact the people who changed it, but with the holidays I don't know how quickly they'll respond.

In the meantime you can also load it by specifying the version:

ds = tfds.load("fractal20220817_data:0.1.0", data_dir="gs://gresearch/robotics")

Hey, I am facing on the same issue, I tried the recommended line on Jupyter Notebook: ds = tfds.load("fractal20220817_data:0.1.0", data_dir="gs://gresearch/robotics") But still not work, and I got: UnimplementedError: File system scheme 'gs' not implemented (file: 'gs://gresearch/robotics/fractal20220817_data/0.1.0/features.json')

And the same line in Colab, it doesn't raise errors,

but I got a stupid question which is: How could I down the dataset (for example, fractal20220817_data) to my local PC, plz?

Thx a lot!

Rahulraj0308 commented 7 months ago

@tomvdw or can we continue using the tfds.builder_from_directory workaround for loading datasets from the specified directory...?