Open nikonikolov opened 2 months ago
Hi there, thank you!
We indeed had difficulties on some of them (which is why they're not uploaded yet) but don't worry they're coming ;)
Among the issues, some of them are massive (1>TB) and the hub has some limitations in terms of storage and file system. We also faced some issues during video encoding. Right now we are focusing on refactoring LeRobotDataset so we'll probably add the remaining ones after that. cc @michel-aractingi
Hi, thanks for the reply. I actually did end up converting 90% of the openx datasets using a fork of lerobot. Sharing some findings which I think are important and you might find useful:
lerobot/common/datasets/push_dataset_to_hub/openx/transforms.py
needs to be reworked. I haven't gotten through all of it yet, but some of these transforms perform randomization (e.g. for droid) which is not what you want for the raw dataset. I believe these transforms, used by OpenVLA and Octo are meant for train-time.success
field to filter episodes for trainingHappy to share code or discuss further, hope this helps :)
cc @michel-aractingi for visibility.
Also, forgot to mention, some datasets have flipped image channels (BGR instead of RGB). Following https://github.com/kpertsch/rlds_dataset_mod/blob/main/prepare_open_x.sh:
berkeley_autolab_ur5
: flip_wrist_image_channels
stanford_hydra_dataset_converted_externally_to_rlds
: flip_wrist_image_channels,flip_image_channels
utaustin_mutex
: flip_wrist_image_channels,flip_image_channels
berkeley_fanuc_manipulation
: flip_wrist_image_channels,flip_image_channels
Hey @nikonikolov,
Thanks alot for sharing your notes and experience its a huge help!
I am doing a small refactoring of the Open-X datasets scripts and will simplify transforms.py
. You're right I first got to the inspiration by how OpenVLA handles the Open-X datasets, but I will add all the missing raw data to the datasets.
Part of the refactoring I think we should have a general rlds format and open-X to be special case of that format. So probably the format name might change to rlds.openx.dataset_name
.
Thanks for the great work! I am interested in converting more of the open-x datasets to
LeRobotDataset
.LeRobotDataset
or it's generally as easy as calling the conversion script?