Hiveformer: History-aware instruction-conditioned multi-view transformer for robotic manipulation

This is a PyTorch re-implementation of the Hiveformer paper:

Instruction-driven history-aware policies for robotic manipulations
Pierre-Louis Guhur, Shizhe Chen, Ricardo Garcia, Makarand Tapaswi, Ivan Laptev, Cordelia Schmid
CoRL 2022 (oral)

Prerequisite

Installation.

Option 1: Use our pre-build singularity image.

singularity pull library://rjgpinel/rlbench/vlc_rlbench.sif

Option 2: Install everything from scratch.

conda create --name hiveformer python=3.9
conda activate hiveformer

See instructions in PyRep and RLBench to install RLBench simulator (with VirtualGL in headless machines). Use our modified version of RLBench to support additional tasks.

pip install -r requirements.txt

export PYTHONPATH=$PYTHONPATH:$(pwd)

Dataset generation

Option 1: Use our generated datasets including the keystep trajectories and instruction embeddings.

Option 2: generate the dataset on your own.

seed=0
task=put_knife_on_chopping_board
variation=0
variation_count=1

# 1. generate microstep demonstrations
python preprocess/generate_dataset_microsteps.py \
     --save_path data/train_dataset/microsteps/seed{seed} \
    --all_task_file assets/all_tasks.json \
    --image_size 128,128 --renderer opengl \
    --episodes_per_task 100 \
    --tasks ${task} --variations ${variation_count} --offset ${variation} \
    --processes 1 --seed ${seed} 

# 2. generate keystep demonstrations
python preprocess/generate_dataset_keysteps.py \
    --microstep_data_dir data/train_dataset/microsteps/seed${seed} \
    --keystep_data_dir data/train_dataset/keysteps/seed${seed} \
    --tasks ${task}

# 3. (optional) check the correctness of generated keysteps
python preprocess/evaluate_dataset_keysteps.py \
    --microstep_data_dir data/train_dataset/microsteps/seed${seed} \
    --keystep_data_dir data/train_dataset/keysteps/seed${seed} \
     --tasks ${task}

# 4. generate instructions embeddings for the tasks
python preprocess/generate_instructions.py \
    --encoder clip \
    --output_file data/train_dataset/taskvar_instrs/clip

Train

Our codes support distributed training with multiple GPUs in SLURM clusters.

For slurm users, please use the following command to launch the training script.

sbatch job_scripts/train_multitask_bc.sh

For non-slurm users, please manually set the environment variables as follows.

export WORLD_SIZE=1
export MASTER_ADDR='localhost'
export MASTER_PORT=10000

export LOCAL_RANK=0 
export RANK=0
export CUDA_VISIBLE_DEVICES=0

python train_models.py --exp-config config/transformer_unet.yaml

Evaluation

For slurm users, please use the following command to launch the evaluation script.

sbatch job_scripts/eval_tst_split.sh

For non-slurm users, run the following commands to evaluate the trained model.

# set outdir to the directory of your trained model
export DISPLAY=:0.0 # in headless machines

# validation: select the best epoch
for step in {5000..300000..5000}
do
python eval_models.py \
    --exp_config ${outdir}/logs/training_config.yaml \
    --seed 100 --num_demos 20 \
    checkpoint ${outdir}/ckpts/model_step_${step}.pt
done

# run the script to summarize the validation results
python summarize_val_results.py --result_file ${outdir}/preds/seed100/results.jsonl

# test: use a different seed from validation
step=300000
python eval_models.py \
    --exp_config ${outdir}/logs/training_config.yaml \
    --seed 200 --num_demos 500 \
    checkpoint ${outdir}/ckpts/model_step_${step}.pt

# run the script to summarize the testing results
python summarize_tst_results.py --result_file ${outdir}/preds/seed200/results.jsonl

We provided trained models in Dropbox for the multi-task setting (10 tasks). You could obtain results as follows which are similar to the results in the paper:

	pick_ and_lift	pick_up _cup	put_knifeon chopping_board	put_money _in_safe	push_ button	reach_ target	slide_block _to_target	stack _wine	take_money _out_safe	take_umbrellaout of_umbrella_stand	Avg.
seed=0	89.00	76.80	72.80	93.00	69.60	100.00	74.20	87.20	73.20	89.80	82.56
seed=2	91.40	75.80	76.20	81.60	86.60	100.00	85.00	89.00	72.80	79.60	83.80
seed=4	91.60	83.60	72.80	83.00	88.40	100.00	57.80	83.20	69.60	89.60	81.96
Avg.	90.67	78.73	73.93	85.87	81.53	100.00	72.33	86.47	71.87	86.33	82.77

We also trained the hiveformer model on 74 RLBench tasks. For the single-task setting, it achieves 66.09% success rate on average. For the multi-task setting, it achieves 49.22%. The multi-task policy is provided in Dropbox.

vlc-robot / hiveformer

readme

Hiveformer: History-aware instruction-conditioned multi-view transformer for robotic manipulation

Prerequisite

Train

Evaluation