tensorflow / datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
https://www.tensorflow.org/datasets
Apache License 2.0
4.26k stars 1.53k forks source link

HTTP Error 301 #5360

Open yepw opened 4 months ago

yepw commented 4 months ago

Short description Error when loading dataset "berkeley_autolab_ur5". I didn't run it on Colab but tried to download the dataset locally. I tried to disable GCS following the comments here

Environment information

Reproduction instructions

import tensorflow_datasets as tfds
tfds.core.utils.gcs_utils._is_gcs_disabled = True
import os
os.environ['NO_GCE_CHECK'] = 'true'
tfds = tfds.load('berkeley_autolab_ur5')

Link to logs

2024-04-10 12:13:41.581284: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-10 12:13:41.581593: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-10 12:13:41.583676: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-10 12:13:41.610749: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-10 12:13:42.168805: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-04-10 12:13:42.461787: I external/local_tsl/tsl/platform/cloud/google_auth_provider.cc:180] Attempting an empty bearer token since no token was retrieved from files, and GCE metadata check was skipped.
2024-04-10 12:13:44.161517: I external/local_tsl/tsl/platform/cloud/google_auth_provider.cc:180] Attempting an empty bearer token since no token was retrieved from files, and GCE metadata check was skipped.
2024-04-10 12:13:46.965011: I external/local_tsl/tsl/platform/cloud/google_auth_provider.cc:180] Attempting an empty bearer token since no token was retrieved from files, and GCE metadata check was skipped.
2024-04-10 12:13:51.745913: I external/local_tsl/tsl/platform/cloud/google_auth_provider.cc:180] Attempting an empty bearer token since no token was retrieved from files, and GCE metadata check was skipped.
2024-04-10 12:13:59.867438: I external/local_tsl/tsl/platform/cloud/google_auth_provider.cc:180] Attempting an empty bearer token since no token was retrieved from files, and GCE metadata check was skipped.
2024-04-10 12:14:16.302598: I external/local_tsl/tsl/platform/cloud/google_auth_provider.cc:180] Attempting an empty bearer token since no token was retrieved from files, and GCE metadata check was skipped.
2024-04-10 12:14:49.168077: I external/local_tsl/tsl/platform/cloud/google_auth_provider.cc:180] Attempting an empty bearer token since no token was retrieved from files, and GCE metadata check was skipped.
2024-04-10 12:15:22.071999: I external/local_tsl/tsl/platform/cloud/google_auth_provider.cc:180] Attempting an empty bearer token since no token was retrieved from files, and GCE metadata check was skipped.
2024-04-10 12:15:54.555505: I external/local_tsl/tsl/platform/cloud/google_auth_provider.cc:180] Attempting an empty bearer token since no token was retrieved from files, and GCE metadata check was skipped.
2024-04-10 12:16:26.999251: I external/local_tsl/tsl/platform/cloud/google_auth_provider.cc:180] Attempting an empty bearer token since no token was retrieved from files, and GCE metadata check was skipped.
2024-04-10 12:16:59.968858: I external/local_tsl/tsl/platform/cloud/google_auth_provider.cc:180] Attempting an empty bearer token since no token was retrieved from files, and GCE metadata check was skipped.
Traceback (most recent call last):
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 436, in try_reraise
    yield
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/load.py", line 222, in builder
    return cls(**builder_kwargs)  # pytype: disable=not-instantiable
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/logging/__init__.py", line 288, in decorator
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/dataset_builder.py", line 1370, in __init__
    super().__init__(**kwargs)
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/logging/__init__.py", line 288, in decorator
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/dataset_builder.py", line 287, in __init__
    self.info.initialize_from_bucket()
    ^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/logging/__init__.py", line 168, in __call__
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/dataset_builder.py", line 482, in info
    info = self._info()
           ^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/robotics/dataset_importer_builder.py", line 82, in _info
    features = self.get_ds_builder().info.features
               ^^^^^^^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/robotics/dataset_importer_builder.py", line 149, in get_ds_builder
    ds_builder = tfds.builder_from_directory(ds_location)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/read_only_builder.py", line 150, in builder_from_directory
    return ReadOnlyBuilder(builder_dir=builder_dir)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/logging/__init__.py", line 288, in decorator
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/read_only_builder.py", line 66, in __init__
    info_proto = dataset_info.read_proto_from_builder_dir(builder_dir)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/dataset_info.py", line 1059, in read_proto_from_builder_dir
    return read_from_json(info_path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/dataset_info.py", line 1035, in read_from_json
    json_str = epath.Path(path).read_text()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/etils/epath/abstract_path.py", line 157, in read_text
    return f.read()
           ^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow/python/lib/io/file_io.py", line 118, in read
    length = self.size() - self.tell()
             ^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow/python/lib/io/file_io.py", line 96, in size
    return stat(self.__name).length
           ^^^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow/python/lib/io/file_io.py", line 908, in stat
    return stat_v2(filename)
           ^^^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow/python/lib/io/file_io.py", line 924, in stat_v2
    return _pywrap_file_io.Stat(compat.path_to_str(path))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.AbortedError: All 10 retry attempts failed. The last failure: Error executing an HTTP request: HTTP response code 301 with body '<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.googleapis.com/storage/v1/b/gresearch/o/robotics%2Fberkeley_autolab_ur5%2F0.1.0%2Fdataset_info.json?fields=size%2Cgeneration%2Cupdated">here</A>.
</BODY></HTML>
'
     when reading metadata of gs://gresearch/robotics/berkeley_autolab_ur5/0.1.0/dataset_info.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/logging/__init__.py", line 168, in __call__
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/load.py", line 643, in load
    dbuilder = _fetch_builder(
               ^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/load.py", line 498, in _fetch_builder
    return builder(name, data_dir=data_dir, try_gcs=try_gcs, **builder_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/logging/__init__.py", line 168, in __call__
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/load.py", line 219, in builder
    with py_utils.try_reraise(
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 438, in try_reraise
    reraise(e, *args, **kwargs)
  File "/home/yeping/anaconda3/envs/tf-n/lib/python3.11/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 405, in reraise
    raise exception from e
RuntimeError: AbortedError: Failed to construct dataset "berkeley_autolab_ur5", builder_kwargs "{'data_dir': None}": All 10 retry attempts failed. The last failure: Error executing an HTTP request: HTTP response code 301 with body '<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.googleapis.com/storage/v1/b/gresearch/o/robotics%2Fberkeley_autolab_ur5%2F0.1.0%2Fdataset_info.json?fields=size%2Cgeneration%2Cupdated">here</A>.
</BODY></HTML>
'
     when reading metadata of gs://gresearch/robotics/berkeley_autolab_ur5/0.1.0/dataset_info.json

Expected behavior It starts downloading the dataset.

marcenacp commented 4 months ago

It seems you cannot disable GCS for this dataset as it downloads all files from the buckets (source). Can you download the dataset from GCS and build it using Beam?

lgeiger commented 2 months ago

I am running into the same issue with the latest TF nightly but this seems unrelated to TFDS. I opened an issue for this on upstream TF https://github.com/tensorflow/tensorflow/issues/69789

@marcenacp Could you have a look at the upstream issue and forward it to the relevant TF team?