tensorflow / datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
https://www.tensorflow.org/datasets
Apache License 2.0
4.3k stars 1.54k forks source link

tfds failed to load open-x-embodiement dataset #5392

Closed WesleyHsieh0806 closed 5 months ago

WesleyHsieh0806 commented 6 months ago

/!\ PLEASE INCLUDE THE FULL STACKTRACE AND CODE SNIPPET

Short description Fail to load dataset fractal20220817_data

Environment information

Reproduction instructions

import tensorflow as tf
import tensorflow_datasets as tfds
from tqdm import tqdm

# 66 datasets excluding droid
datasets = [
    'fractal20220817_data',
    'kuka',
    'bridge',
    'taco_play',
    'jaco_play',
    'berkeley_cable_routing',
    'roboturk',
    'nyu_door_opening_surprising_effectiveness',
    'viola',
    'berkeley_autolab_ur5',
    'toto',
    'language_table',
    'columbia_cairlab_pusht_real',
    'stanford_kuka_multimodal_dataset_converted_externally_to_rlds',
    'nyu_rot_dataset_converted_externally_to_rlds',
    'stanford_hydra_dataset_converted_externally_to_rlds',
    'austin_buds_dataset_converted_externally_to_rlds',
    'nyu_franka_play_dataset_converted_externally_to_rlds',
    'maniskill_dataset_converted_externally_to_rlds',
    'furniture_bench_dataset_converted_externally_to_rlds',
    'cmu_franka_exploration_dataset_converted_externally_to_rlds',
    'ucsd_kitchen_dataset_converted_externally_to_rlds',
    'ucsd_pick_and_place_dataset_converted_externally_to_rlds',
    'austin_sailor_dataset_converted_externally_to_rlds',
    'austin_sirius_dataset_converted_externally_to_rlds',
    'bc_z', 'usc_cloth_sim_converted_externally_to_rlds',
            'utokyo_pr2_opening_fridge_converted_externally_to_rlds',
            'utokyo_pr2_tabletop_manipulation_converted_externally_to_rlds',
            'utokyo_saytap_converted_externally_to_rlds',
            'utokyo_xarm_pick_and_place_converted_externally_to_rlds',
            'utokyo_xarm_bimanual_converted_externally_to_rlds',
            'robo_net',
            'berkeley_mvp_converted_externally_to_rlds',
            'berkeley_rpt_converted_externally_to_rlds',
            'kaist_nonprehensile_converted_externally_to_rlds',
            'stanford_mask_vit_converted_externally_to_rlds',
            'tokyo_u_lsmo_converted_externally_to_rlds',
            'dlr_sara_pour_converted_externally_to_rlds',
            'dlr_sara_grid_clamp_converted_externally_to_rlds',
            'dlr_edan_shared_control_converted_externally_to_rlds',
            'asu_table_top_converted_externally_to_rlds',
            'stanford_robocook_converted_externally_to_rlds',
            'eth_agent_affordances',
            'imperialcollege_sawyer_wrist_cam',
            'iamlab_cmu_pickup_insert_converted_externally_to_rlds',
            'qut_dexterous_manipulation',
            'uiuc_d3field',
            'utaustin_mutex',
            'berkeley_fanuc_manipulation',
            'cmu_playing_with_food',
            'cmu_play_fusion',
            'cmu_stretch',
            'berkeley_gnm_recon',
            'berkeley_gnm_cory_hall',
            'berkeley_gnm_sac_son',
            'robot_vqa',
            'conq_hose_manipulation',
            'dobbe',
            'fmb',
            'io_ai_tech',
            'mimic_play',
            'aloha_mobile',
            'robo_set',
            'tidybot',
            'vima_converted_externally_to_rlds'
]

print('Download {} datasets from Open-X-Embodiement...'.format(len(datasets)))

# optionally replace the DATASET_NAMES below with the list of filtered datasets from the google sheet
DOWNLOAD_DIR = '~/Open-X-Embodiement'

print(f"Downloading {len(datasets)} datasets to {DOWNLOAD_DIR}.")
for dataset_name in tqdm(datasets):
    # print(tfds.__version__)
    _ = tfds.load(
        dataset_name, data_dir=DOWNLOAD_DIR)

If you share a colab, make sure to update the permissions to share it.

Link to logs

Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 442, in try_reraise
    yield
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/load.py", line 220, in builder
    return cls(**builder_kwargs)  # pytype: disable=not-instantiable
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/logging/__init__.py", line 289, in decorator
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/dataset_builder.py", line 1370, in __init__
    super().__init__(**kwargs)
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/logging/__init__.py", line 289, in decorator
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/dataset_builder.py", line 287, in __init__
    self.info.initialize_from_bucket()
    ^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/logging/__init__.py", line 169, in __call__
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/dataset_builder.py", line 482, in info
    info = self._info()
           ^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/robotics/dataset_importer_builder.py", line 82, in _info
    features = self.get_ds_builder().info.features
               ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/robotics/dataset_importer_builder.py", line 149, in get_ds_builder
    ds_builder = tfds.builder_from_directory(ds_location)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/read_only_builder.py", line 150, in builder_from_directory
    return ReadOnlyBuilder(builder_dir=builder_dir)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/logging/__init__.py", line 289, in decorator
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/read_only_builder.py", line 66, in __init__
    info_proto = dataset_info.read_proto_from_builder_dir(builder_dir)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/dataset_info.py", line 1059, in read_proto_from_builder_dir
    return read_from_json(info_path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/dataset_info.py", line 1037, in read_from_json
    raise FileNotFoundError(f"Could not load dataset info from {path}") from e
FileNotFoundError: Could not load dataset info from gs:/gresearch/robotics/fractal20220817_data/0.1.0/dataset_info.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./download_v1.py", line 82, in <module>
    _ = tfds.load(
        ^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/logging/__init__.py", line 169, in __call__
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/load.py", line 641, in load
    dbuilder = _fetch_builder(
               ^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/load.py", line 496, in _fetch_builder
    return builder(name, data_dir=data_dir, try_gcs=try_gcs, **builder_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/logging/__init__.py", line 169, in __call__
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/load.py", line 217, in builder
    with py_utils.try_reraise(
  File "/root/miniconda3/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 444, in try_reraise
    reraise(e, *args, **kwargs)
  File "/root/miniconda3/lib/python3.12/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 411, in reraise
    raise exception from e
FileNotFoundError: Failed to construct dataset "fractal20220817_data", builder_kwargs "{'data_dir': '~/Open-X-Embodiement'}": Could not load dataset info from gs:/gresearch/robotics/fractal20220817_data/0.1.0/dataset_info.json

Expected behavior Successfully download each dataset listed.

ccl-core commented 6 months ago

Hi @WesleyHsieh0806 , thank you for reporting this issue! We will have a closer look into this.

In the meanwhile, you should be able to load the dataset with:

ds = tfds.load("fractal20220817_data:0.1.0", data_dir="gs://gresearch/robotics")

or:

ds = tfds.load("robotics:fractal20220817_data:0.1.0")

Thanks!

ccl-core commented 6 months ago

Hi @WesleyHsieh0806 , a clarification question: have you by any chance modified your code? The line:

FileNotFoundError: Could not load dataset info from gs:/gresearch/robotics/fractal20220817_data/0.1.0/dataset_info.json

in your error stack seems very odd: it is not clear where the gs:/ prefix with only one slash (instead of gs://) comes from...

WesleyHsieh0806 commented 5 months ago

I resolved this issue by using the following command to download the data

gsutil -m cp -r gs://gdm-robotics-open-x-embodiment/{dataset_name} ~/tensorflow_datasets/

ccl-core commented 5 months ago

Hi @WesleyHsieh0806 , thank you for the update! Great to know that you are unblocked now :)

I am closing the bug, but please feel free to reopen it in case you encounter any further problem with this dataset. And pleae feel free to open a PR if you want to contribute to TFDS!