Closed gadorneles closed 1 month ago
Hello!
imitation-in-homes/configs/model/resnet_dobb-e.yaml
). So, you may want to swap it out with your image encoder of choice (we haven't tried using our encoder on a different gripper, but using a ResNet with Imagenet pre-trained weights could work).
b. As long as the data is in the correct format, having .mp4 videos with appropriate labels is sufficient for training:
.
The rgb_rel_videos_exported.txt file can be empty (it just needs to exist). However, our dataloader takes in quaternion, which is then converted to axis-angle, so our code expects the rotation labels to be in quaternion. Furthermore, training a VQ-BeT policy is a 2-step process:
First, set the dataset roots in configs/env_vars/env_vars.yaml
. Then,
include_task
to the task name (e.g. Bag_Pick_Up), in train_rvq.yaml
and train by python train.py —config-name=train_rvq
. include_task
and alsovqvae_load_dir
in train_rvq.yaml
with the checkpoint.pt
path of the rvq you just trained. Then, train by python train.py —config-name=train_vqbet
. If you have multiple GPUs, you can also train in parallel by accelerate launch --config_file configs/accelerate/accel_cfg.yaml —config-name=train_vqbet
, changing num_processes
in configs/accelerate/accel_cfg.yaml
accordingly. If you have any further questions, let us know.
Assuming I wanted to train for different tasks with a different gripper, for example, the one from a franka emika panda. Would it be possible to train and reproduce the method if I had a dataset with all the correct information (30fps video, 6D pose and gripper opening angle)? Or is the network completely dependant on the gripper type used here?
Do I need the r3d files for training or do I only need the MP4 video with the appropriate labels?