[NeurIPS 2022] Official implementation of the paper: Human-AI Shared Control via Policy Dissection
Webpage | Code | Video | Paper |
In this repo, we provide the implementation of Policy Dissection and some interactive neural controllers enabled by this method.
Supported Environments:
# Clone the code to local
git clone https://github.com/metadriverse/policydissect.git
cd policydissect
# Create virtual environment
conda create -n policydissect python=3.7
conda activate policydissect
# install torch
pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
# Install basic dependency
pip install -e .
For playing with agents trained in IsaacGym, follow the instructions below to install IsaacGym
isaacgym/python && pip install -e .
Please review the file isaacgym/docs/install.html
for more information on installation.
See the Troubleshooting section for debugging.
For playing with the Mujoco-Ant and Mujoco-Walker, please
~/.mujoco/mujoco200
~/.mujoco/
.bashrc
: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/USERNAME/.mujoco/mujoco200/bin
pip install mujoco-py==2.0.2.7
(Solutions of compiling error can be easily found at https://github.com/openai/mujoco-py/issues)To collaborate with the AI driver in MetaDrive environment, run:
# MetaDrive
# Keymap:
# - KEY_W: lane following
# - KEY_A: left lane changing
# - KEY_S: braking
# - KEY_D: right lane changing
# - KEY_R:Reset
python play/play_metadrive.py
The quadrupedal robot is trained with the code provided by https://github.com/Mehooz/vision4leg.git. For playing with legged robot, run:
# Pybullet Quadrupedal Robot
# Keymap:
# - KEY_W: forward
# - KEY_A: moving left
# - KEY_S: stop
# - KEY_D: moving right
# - KEY_R: reset
python play/play_pybullet_a1.py
python play/play_pybullet_a1.py --hard
python play/play_pybullet_a1.py --hard --seed 1001
Also, you can collaborate with AI and challenge the hard environment consisting of obstacles and challenging terrains by
adding --hard
flag. You can change to a different environment by adding --seed your_seed_int_type
.
tips: Avoid running fast!
The Cassie robot is trained with the code provided by https://github.com/leggedrobotics/legged_gym with a fixed forward
command [1, 0, 0]
, and thus can only move forward. By applying Policy Dissection, primitives related to yaw
rate, forward speed, height control and torque force can be identified. Activating these primitives
enable various skills like crouching, forward jumping, back-flipping and so on.
Run the following command to play with the robot. Add flag--parkour
to launch a challenging parkour environment.
# Keymap:
# - KEY_W:Forward
# - KEY_A:Left
# - KEY_S:Stop
# - KEY_C:Crouch
# - KEY_X:Tiptoe
# - KEY_Q:Jump
# - KEY_D:Right
# - KEY_SPACE:Back Flip
# - KEY_R:Reset
python play/play_cassie.py
python play/play_cassie.py --parkour
tips: Switch to Tiptoe state before pressing Key_Q to increase the distance of jump.
Note Do not draw the windows or close the pygame window during running.
We also discover motor primitives in three gym environments: Box2d-BipedalWalker, Mujoco-Ant and Mujoco-Walker. You can try them via:
# BipedalWalker
# Keymap:
# - KEY_W: jump
# - KEY_A: front-flip
# - KEY_S: restore running after jumping
# - KEY_R: reset
python play/play_gym_bipedalwalker.py
# Mujoco-Ant
# Keymap:
# - KEY_W: move up
# - KEY_A: move left
# - KEY_S: move down
# - KEY_D: move right
# - KEY_Q: rotation
# - KEY_R: reset
python play/play_mujoco_ant.py
# Mujoco-Walker
# Keymap:
# - KEY_R: reset
# - KEY_A: stop
# - KEY_W: freeze red knee
# - KEY_D: restore running
python play/play_mujoco_walker.py
To measure the coarseness of the control approach enabled by Policy Dissection, we train a goal-conditioned quadrupedal ANYmal robot controller with code provided by https://github.com/leggedrobotics/legged_gym. We build primitive-activation conditional control system on this controller with a PID controller determining the unit output according to the tracking error. As a result, it can track the target yaw command and can achieve the similar control precision, compared to explicitly indicating the goal in the network input. Video is available here.
The experiment script can be found at play/run_tracking_experiment.py
.
The default yaw tracking is achieved by explicit goal-conditioned control, while
running python play/run_tracking_experiment.py --primitive_activation
will change to primitive-activation conditional control.
In example
folder, we provide two examples showing how to dissect policy. The results can be read by opening
read_result.ipynb
with jupyter notebook. Also, the identified units are chosen as motor primitives for evoking
behaviors of Anymal and the MetaDrive agents. Check previous section about how to play with them.
If you encounter ImportError: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory
, run
this:
export LD_LIBRARY_PATH=/path/to/libpython/directory
# If you are using Conda, the path should be /path/to/conda/envs/your_env/lib.
# For example:
export LD_LIBRARY_PATH=/home/USERNAME/anaconda3/envs/policydissect/lib
If you encounter CalledProcessError: Command '['which', 'c++']' returned non-zero exit status 1.
, try this:
sudo apt-get install build-essential
If you encounter AttributeError: module 'distutils' has no attribute 'version'
from tensorboard,
try this:
pip install -U setuptools==50.0.0
If you encounter: fatal error: GL/osmesa.h: No such file or directory
:
sudo apt-get install libosmesa6-dev
If you encounter: error: [Errno 2] No such file or directory: 'patchelf': 'patchelf'
:
sudo apt-get install patchelf
If you encounter: ERROR: GLEW initalization error: Missing GL version
:
sudo apt-get install -y libglew-dev
@inproceedings{
li2022humanai,
title={Human-{AI} Shared Control via Policy Dissection},
author={Quanyi Li and Zhenghao Peng and Haibin Wu and Lan Feng and Bolei Zhou},
booktitle={Thirty-Sixth Conference on Neural Information Processing Systems},
year={2022},
url={https://openreview.net/forum?id=LCOv-GVVDkp}
}