Perceiver-Actor^2: A Multi-Task Transformer for Bimanual Robotic Manipulation Tasks

This work extends previous work PerAct as well as RLBench for bimanual manipulation tasks.

The repository and documentation are still work in progress.

For the latest updates, see: bimanual.github.io

Installation

Please see Installation for further details.

Prerequisites

The code PerAct^2 is built-off the PerAct which itself is built on the ARM repository by James et al. The prerequisites are the same as PerAct or ARM.

1. Environment

Install miniconda if not already present on the current system. You can use scripts/install_conda.sh for this step:


sudo apt install curl 

curl -L -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x Miniconda3-latest-Linux-x86_64.sh 
./Miniconda3-latest-Linux-x86_64.sh

SHELL_NAME=`basename $SHELL`
eval "$($HOME/miniconda3/bin/conda shell.${SHELL_NAME} hook)"
conda init ${SHELL_NAME}
conda install mamba -c conda-forge
conda config --set auto_activate_base false

Next, create the rlbench environment and install the dependencies

conda create -n rlbench python=3.8
conda activate rlbench
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

2. Dependencies

You need to setup RLBench, Pyrep, and YARR. Please note that due to the bimanual functionallity the main repository does not work. You can use scripts/install_dependencies.sh to do so. See Installation for details.

./scripts/install_dependencies.sh

Pre-Generated Datasets

Please checkout the website for pre-generated RLBench demonstrations. If you directly use these datasets, you don't need to run tools/bimanual_data_generator.py from RLBench. Using these datasets will also help reproducibility since each scene is randomly sampled in data_generator_bimanual.py.

Training

Single-GPU Training

To configure and train the model, follow these guidelines:

General Parameters: You can find and modify general parameters in the conf/config.yaml file. This file contains overall settings for the training environment, such as the number of cameras or the the tasks to use.
Method-Specific Parameters: For parameters specific to each method, refer to the corresponding files located in the conf/method directory. These files define configurations tailored to each method's requirements.

When training adjust the replay.batch_size parameter to maximize the utilization of your GPU resources. Increasing this value can improve training efficiency based on the capacity of your available hardware. You can either modify the config files directly or you can pass parameters directly through the command line when running the training script. This allows for quick adjustments without editing configuration files:

python train.py replay.batch_size=3 method=BIMANUAL_PERACT

In this example, the command sets replay.batch_size to 3 and specifies the use of the BIMANUAL_PERACT method for training. Another important parameter to specify the tasks is rlbench.task_name, which sets the overall task, and rlbench.tasks, which is a list of tasks used for training. Note that these can be different for evaluation. A complete set of tasks is shown below:


rlbench:
  task_name: multi
  tasks:
  - coordinated_push_box
  - coordinated_lift_ball
  - dual_push_buttons
  - bimanual_pick_plate
  - coordinated_put_item_in_drawer
  - coordinated_put_bottle_in_fridge
  - handover_item
  - bimanual_pick_laptop
  - bimanual_straighten_rope
  - bimanual_sweep_to_dustpan
  - coordinated_lift_tray
  - handover_item_easy
  - coordinated_take_tray_out_of_oven

Multi-GPU and Multi-Node Training

This repository supports multi-GPU training and distributed training across multiple nodes using PyTorch Distributed Data Parallel (DDP). Follow the instructions below to configure and run training across multiple GPUs and nodes.

Multi-GPU Training on a Single Node

To train using multiple GPUs on a single node, set the parameter ddp.num_devices to the number of GPUs available. For example, if you have 4 GPUs, you can start the training process as follows:

python train.py replay.batch_size=3 method=BIMANUAL_PERACT ddp.num_devices=4

This command will utilize 4 GPUs on the current node for training. Remember to set the replay.batch_size, which is per GPU.

Multi-Node Training Across Different Nodes

If you want to perform distributed training across multiple nodes, you need to set additional parameters: ddp.master_addr and ddp.master_port. These parameters should be configured as follows:

ddp.master_addr: The IP address of the master node (usually the node where the training is initiated). ddp.master_port: A port number to be used for communication across nodes.

Example Command:

python train.py replay.batch_size=3 method=BIMANUAL_PERACT ddp.num_devices=4 ddp.master_addr=192.168.1.1 ddp.master_port=29500

Note: Ensure that all nodes can communicate with each other through the specified IP and port, and that they have the same codebase, data access, and configurations for a successful distributed training run.

Evaluation

Similar to training you can find general parameters in conf/eval.yaml and method specific parameters in the conf/method directory. For each method, you have to set the execution mode in RLBench. For bimanual agents such as BIMANUAL_PERACT or PERACT_BC this is:

rlbench:
  gripper_mode: 'BimanualDiscrete'
  arm_action_mode: 'BimanualEndEffectorPoseViaPlanning'
  action_mode: 'BimanualMoveArmThenGripper'

To generate videos of the current evaluation you can set cinematic_recorder.enabled to True. It is recommended during evalution to disable the recorder, i.e. cinematic_recorder.enabled=False, as rendering the video increases the total evaluation time.

Acknowledgements

This repository uses code from the following open-source projects:

ARM

Original: https://github.com/stepjam/ARM
License: ARM License
Changes: Data loading was modified for PerAct. Voxelization code was modified for DDP training.

PerceiverIO

Original: https://github.com/lucidrains/perceiver-pytorch
License: MIT
Changes: PerceiverIO adapted for 6-DoF manipulation.

ViT

Original: https://github.com/lucidrains/vit-pytorch
License: MIT
Changes: ViT adapted for baseline.

LAMB Optimizer

Original: https://github.com/cybertronai/pytorch-lamb
License: MIT
Changes: None.

OpenAI CLIP

Original: https://github.com/openai/CLIP
License: MIT
Changes: Minor modifications to extract token and sentence features.

Thanks for open-sourcing!

Licenses

PerAct License (Apache 2.0) - Perceiver-Actor Transformer
ARM License - Voxelization and Data Preprocessing
YARR Licence (Apache 2.0)
RLBench Licence
PyRep License (MIT)
Perceiver PyTorch License (MIT)
LAMB License (MIT)
CLIP License (MIT)

Release Notes

Update 2024-10-17

Update Readme

Update 2024-07-10

Initial release

Citations

PerAct^2

@misc{grotz2024peract2,
      title={PerAct2: Benchmarking and Learning for Robotic Bimanual Manipulation Tasks}, 
      author={Markus Grotz and Mohit Shridhar and Tamim Asfour and Dieter Fox},
      year={2024},
      eprint={2407.00278},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2407.00278}, 
}

PerAct

@inproceedings{shridhar2022peract,
  title     = {Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation},
  author    = {Shridhar, Mohit and Manuelli, Lucas and Fox, Dieter},
  booktitle = {Proceedings of the 6th Conference on Robot Learning (CoRL)},
  year      = {2022},
}

C2FARM

@inproceedings{james2022coarse,
  title={Coarse-to-fine q-attention: Efficient learning for visual robotic manipulation via discretisation},
  author={James, Stephen and Wada, Kentaro and Laidlow, Tristan and Davison, Andrew J},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={13739--13748},
  year={2022}
}

PerceiverIO

@article{jaegle2021perceiver,
  title={Perceiver io: A general architecture for structured inputs \& outputs},
  author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
  journal={arXiv preprint arXiv:2107.14795},
  year={2021}
}

RLBench

@article{james2020rlbench,
  title={Rlbench: The robot learning benchmark \& learning environment},
  author={James, Stephen and Ma, Zicong and Arrojo, David Rovick and Davison, Andrew J},
  journal={IEEE Robotics and Automation Letters},
  volume={5},
  number={2},
  pages={3019--3026},
  year={2020},
  publisher={IEEE}
}

Questions or Issues?

Please file an issue with the issue tracker.

markusgrotz / peract_bimanual

readme