robot-colosseum / rvt_colosseum

Other
1 stars 0 forks source link

PWC

RVT: Robotic View Transformer for 3D Object Manipulation
Ankit Goyal, Jie Xu, Yijie Guo, Valts Blukis, Yu-Wei Chao, Dieter Fox
CoRL 2023 (Oral)

If you find our work useful, please consider citing:

@article{,
  title={RVT: Robotic View Transformer for 3D Object Manipulation},
  author={Goyal, Ankit and Xu, Jie and Guo, Yijie and Blukis, Valts and Chao, Yu-Wei and Fox, Dieter},
  journal={CoRL},
  year={2023}
}

Getting Started

Install RVT

--- skip if already done while installing Colosseum ---

Once you have downloaded CoppeliaSim, add the following to your ~/.bashrc file. (NOTE: the 'EDIT ME' in the first line)

export COPPELIASIM_ROOT=<EDIT ME>/PATH/TO/COPPELIASIM/INSTALL/DIR
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$COPPELIASIM_ROOT
export QT_QPA_PLATFORM_PLUGIN_PATH=$COPPELIASIM_ROOT
export DISLAY=:1.0

Remember to source your .bashrc (source ~/.bashrc) or .zshrc (source ~/.zshrc) after this.

--- skip if already done while installing Colosseum ---

git clone --recurse-submodules git@github.com:robot-colosseum/rvt_colosseum.git && cd rvt_colosseum && git submodule update --init

Now, locally install RVT and other libraries using the following command. Make sure you are in folder RVT.

pip install -e . 
pip install -e rvt/libs/PyRep 
pip install -e rvt/libs/RLBench 
pip install -e rvt/libs/YARR 
pip install -e rvt/libs/peract_colab

Evaluating on The Colosseum:

Evaluate on pre-trained RVT baseline

You can download a pre-trained RVT agent trained on the 20 RLBench tasks from Colosseum without any perturbations

Training RVT on default RLBench dataset

cd rvt
bash run_train.sh

Test RVT on The Colosseum perturbation factors

min_var_num=0
max_var_num=500
total_processes=50
processes_per_gpu=3
bash run_eval_variations.sh $min_var_num $max_var_num $total_processes $processes_per_gpu

This file launches parallel evaluation processes. Adjust the number of processes to run in parallel based on GPU number and size availability. The task_list can be edited inside run_eval_variations.sh to run specific task or variation number.

Using the library:

Training RVT

Default command

To train RVT on all RLBench tasks, use the following command (from folder RVT/rvt):

python train.py --exp_cfg_path configs/all.yaml --device 0,1,2,3,4,5,6,7

We use 8 V100 GPUs. Change the device flag depending on available compute.

More details about train.py
python train.py --exp_cfg_opts <> --mvt_cfg_opts <> --exp_cfg_path <> --mvt_cfg_path <>

The following command overwrites the parameters for the experiment with the configs/all.yaml file. It also overwrites the bs parameters through the command line.

python train.py --exp_cfg_opts "bs 4" --exp_cfg_path configs/all.yaml --device 0

Evaluate on RLBench

Evaluate RVT on RLBench

Download the pretrained RVT model. Place the model (model_14.pth trained for 15 epochs or 100K steps) and the config files under the folder runs/rvt/. Run evaluation using (from folder RVT/rvt):

python eval.py --model-folder runs/rvt  --eval-datafolder ./data/test --tasks all --eval-episodes 25 --log-name test/1 --device 0 --headless --model-name model_14.pth
Evaluate the official PerAct model on RLBench

Download the officially released PerAct model. Put the downloaded policy under the runs folder with the recommended folder layout: runs/peract_official/seed0. Run the evaluation using:

python eval.py --eval-episodes 25 --peract_official --peract_model_dir runs/peract_official/seed0/weights/600000 --model-name QAttentionAgent_layer0.pt --headless --task all --eval-datafolder ./data/test --device 0 

Gotchas

pip uninstall opencv-python                                                                                         
pip install opencv-python-headless

FAQ's

Q. What is the advantag of RVT over PerAct?

RVT is both faster to train and performs better than PerAct.

Q. What resources are required to train RVT?

For training on 18 RLBench tasks, with 100 demos per task, we use 8 V100 GPUs (16 GB memory each). The model trains in ~1 day.

Note that for fair comparison with PerAct, we used the same dataset, which means duplicate keyframes are loaded into the replay buffer. For other datasets, one could consider not doing so, which might further speed up training.

Q. Why do you use pe_fix=True in the rvt config?

For fair comparison with offical PerAct model, we use this setting. More detials about this can be found in PerAct code. For future, we recommend using pe_fix=False for language input.

Q. Why are the results for PerAct different from the PerAct paper?

In the PerAct paper, for each task, the best checkpoint is chosen based on the validation set performance. Hence, the model weights can be different for different tasks. We evaluate PerAct and RVT only on the final checkpoint, so that all tasks are strictly evaluated on the same model weights. Note that only the final model for PerAct has been released officially.

Q. Why is there a variance in performance on RLBench even when evaluting the same checkpoint?

We hypothesize that it is because of the sampling based planner used in RLBench, which could be the source of the randomization. Hence, we evaluate each checkpoint 5 times and report mean and variance.

Q. Why did you use a cosine decay learning rate scheduler instead of a fixed learning rate schedule as done in PerAct?

We found the cosine learning rate scheduler led to faster convergence for RVT. Training PerAct with our training hyper-parameters (cosine learning rate scheduler and same number of iterations) led to worse performance (in ~4 days of training time). Hence for Fig. 1, we used the official hyper-parameters for PerAct.

Q. For my use case, I want to render images at real camera locations (input camera poses) with PyTorch3D. Is it possible to do so and how can I do that?

Yes, it is possible to do so. A self-sufficient example is present here. Depending on your use case, the code may need be modified. Also note that 3D augmentation cannot be used while rendering images at real camera locations as it would change the pose of the camera with respect to the point cloud.

For questions and comments, please contact Ankit Goyal.

Acknowledgement

We sincerely thank the authors of the following repositories for sharing their code.

License

License Copyright © 2023, NVIDIA Corporation & affiliates. All rights reserved.

This work is made available under the Nvidia Source Code License. The pretrained RVT model is released under the CC-BY-NC-SA-4.0 license.