metadriverse / policydissect

[NeurIPS 2022] Official implementation of the paper: "Human-AI Shared Control via Policy Dissection"
Apache License 2.0
48 stars 5 forks source link
autonomous-driving decision-making legged-robotics neuroscience reinforcement-learning robotics

Policy Dissection

[NeurIPS 2022] Official implementation of the paper: Human-AI Shared Control via Policy Dissection

Webpage | Code | Video | Paper |

In this repo, we provide the implementation of Policy Dissection and some interactive neural controllers enabled by this method.

Supported Environments:

Installation

Basic Installation

# Clone the code to local
git clone https://github.com/metadriverse/policydissect.git
cd policydissect

# Create virtual environment
conda create -n policydissect python=3.7
conda activate policydissect

# install torch
pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

# Install basic dependency
pip install -e .

IsaacGym Installation (Optional)

For playing with agents trained in IsaacGym, follow the instructions below to install IsaacGym

Please review the file isaacgym/docs/install.html for more information on installation. See the Troubleshooting section for debugging.

Mujoco Installation (Optional)

For playing with the Mujoco-Ant and Mujoco-Walker, please

Play with AI

MetaDrive

To collaborate with the AI driver in MetaDrive environment, run:

# MetaDrive
# Keymap:
# - KEY_W: lane following
# - KEY_A: left lane changing
# - KEY_S: braking
# - KEY_D: right lane changing
# - KEY_R:Reset
python play/play_metadrive.py

Pybullet Quadrupedal Robot

The quadrupedal robot is trained with the code provided by https://github.com/Mehooz/vision4leg.git. For playing with legged robot, run:

# Pybullet Quadrupedal Robot
# Keymap:
# - KEY_W: forward
# - KEY_A: moving left
# - KEY_S: stop
# - KEY_D: moving right
# - KEY_R: reset
python play/play_pybullet_a1.py
python play/play_pybullet_a1.py --hard
python play/play_pybullet_a1.py --hard --seed 1001

Also, you can collaborate with AI and challenge the hard environment consisting of obstacles and challenging terrains by adding --hard flag. You can change to a different environment by adding --seed your_seed_int_type.

tips: Avoid running fast!

IsaacGym Cassie

The Cassie robot is trained with the code provided by https://github.com/leggedrobotics/legged_gym with a fixed forward command [1, 0, 0], and thus can only move forward. By applying Policy Dissection, primitives related to yaw rate, forward speed, height control and torque force can be identified. Activating these primitives enable various skills like crouching, forward jumping, back-flipping and so on. Run the following command to play with the robot. Add flag--parkourto launch a challenging parkour environment.

# Keymap:
# - KEY_W:Forward
# - KEY_A:Left
# - KEY_S:Stop
# - KEY_C:Crouch
# - KEY_X:Tiptoe
# - KEY_Q:Jump
# - KEY_D:Right
# - KEY_SPACE:Back Flip
# - KEY_R:Reset
python play/play_cassie.py
python play/play_cassie.py --parkour

tips: Switch to Tiptoe state before pressing Key_Q to increase the distance of jump.

Note Do not draw the windows or close the pygame window during running.

Gym Environments

We also discover motor primitives in three gym environments: Box2d-BipedalWalker, Mujoco-Ant and Mujoco-Walker. You can try them via:

# BipedalWalker
# Keymap:
# - KEY_W: jump
# - KEY_A: front-flip
# - KEY_S: restore running after jumping
# - KEY_R: reset
python play/play_gym_bipedalwalker.py

# Mujoco-Ant
# Keymap:
# - KEY_W: move up
# - KEY_A: move left
# - KEY_S: move down
# - KEY_D: move right
# - KEY_Q: rotation
# - KEY_R: reset
python play/play_mujoco_ant.py

# Mujoco-Walker
# Keymap:
# - KEY_R: reset
# - KEY_A: stop
# - KEY_W: freeze red knee
# - KEY_D: restore running
python play/play_mujoco_walker.py

Comparison with explicit goal-conditioned control

To measure the coarseness of the control approach enabled by Policy Dissection, we train a goal-conditioned quadrupedal ANYmal robot controller with code provided by https://github.com/leggedrobotics/legged_gym. We build primitive-activation conditional control system on this controller with a PID controller determining the unit output according to the tracking error. As a result, it can track the target yaw command and can achieve the similar control precision, compared to explicitly indicating the goal in the network input. Video is available here.

The experiment script can be found at play/run_tracking_experiment.py. The default yaw tracking is achieved by explicit goal-conditioned control, while running python play/run_tracking_experiment.py --primitive_activation will change to primitive-activation conditional control.

Policy Dissection Examples

In example folder, we provide two examples showing how to dissect policy. The results can be read by opening read_result.ipynb with jupyter notebook. Also, the identified units are chosen as motor primitives for evoking behaviors of Anymal and the MetaDrive agents. Check previous section about how to play with them.

Troubleshooting

Installing IsaacGym

If you encounter ImportError: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory, run this:

export LD_LIBRARY_PATH=/path/to/libpython/directory
# If you are using Conda, the path should be /path/to/conda/envs/your_env/lib.
# For example:
export LD_LIBRARY_PATH=/home/USERNAME/anaconda3/envs/policydissect/lib

If you encounter CalledProcessError: Command '['which', 'c++']' returned non-zero exit status 1., try this:

sudo apt-get install build-essential

If you encounter AttributeError: module 'distutils' has no attribute 'version' from tensorboard, try this:

pip install -U setuptools==50.0.0

Installing Mujoco

If you encounter: fatal error: GL/osmesa.h: No such file or directory:

sudo apt-get install libosmesa6-dev

If you encounter: error: [Errno 2] No such file or directory: 'patchelf': 'patchelf':

sudo apt-get install patchelf

If you encounter: ERROR: GLEW initalization error: Missing GL version:

sudo apt-get install -y libglew-dev

Reference

@inproceedings{
    li2022humanai,
    title={Human-{AI} Shared Control via Policy Dissection},
    author={Quanyi Li and Zhenghao Peng and Haibin Wu and Lan Feng and Bolei Zhou},
    booktitle={Thirty-Sixth Conference on Neural Information Processing Systems},
    year={2022},
    url={https://openreview.net/forum?id=LCOv-GVVDkp}
}