huangwl18 / modular-rl

[ICML 2020] PyTorch Code for "One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control"
https://huangwl18.github.io/modular-rl/
Other
217 stars 34 forks source link
decentralized-control deep-learning emergent-communication generalization graph-neural-networks locomotion message-passing modular-control modularity reinforcement-learning

One Policy to Control Them All:
Shared Modular Policies for Agent-Agnostic Control

ICML 2020

[Project Page] [Paper] [Demo Video] [Long Oral Talk]

Wenlong Huang1, Igor Mordatch2, Deepak Pathak3 4

1University of California, Berkeley, 2Google Brain, 3Facebook AI Research, 4Carnegie Mellon University

This is a PyTorch-based implementation of our Shared Modular Policies. We take a step beyond the laborious training process of the conventional single-agent RL policy by tackling the possibility of learning general-purpose controllers for diverse robotic systems. Our approach trains a single policy for a wide variety of agents which can then generalize to unseen agent shapes at test-time without any further training.

If you find this work useful in your research, please cite using the following BibTeX:

@inproceedings{huang2020smp,
  Author = {Huang, Wenlong and
  Mordatch, Igor and Pathak, Deepak},
  Title = {One Policy to Control Them All:
  Shared Modular Policies for Agent-Agnostic Control},
  Booktitle = {ICML},
  Year = {2020}
  }

Setup

Requirements

Setting up repository

  git clone https://github.com/huangwl18/modular-rl.git
  cd modular-rl/
  python3.6 -m venv mrEnv
  source $PWD/mrEnv/bin/activate

Installing Dependencies

  pip install --upgrade pip
  pip install -r requirements.txt

Running Code

Flags and Parameters Description
--morphologies <List of STRING> Find existing environments matching each keyword for training (e.g. walker, hopper, humanoid, and cheetah; see examples below)
--custom_xml <PATH> | Path to custom xml file for training the modular policy.
When <PATH> is a file, train with that xml morphology only.
When <PATH> is a directory, train on all xml morphologies found in the directory.
--td | Enable top-down message passing (pass --td --bu for both-way message passing)
--bu | Enable bottom-up message passing (pass --td --bu for both-way message passing)
--expID <INT> Experiment ID for creating saving directory
--seed <INT> (Optional) Seed for Gym, PyTorch and Numpy

Train with existing environment

Train with custom environment

Visualization

Provided Environments

Walker

walker_2_main

walker_3_main

walker_4_main

walker_5_main

walker_6_main

walker_7_main

walker_2_flipped

walker_3_flipped

walker_4_flipped

walker_5_flipped

walker_6_flipped

walker_7_flipped
2D Humanoid

humanoid_2d_7_left_arm

humanoid_2d_7_left_leg

humanoid_2d_7_lower_arms

humanoid_2d_7_right_arm

humanoid_2d_7_right_leg

humanoid_2d_8_left_knee

humanoid_2d_8_right_knee

humanoid_2d_9_full
Cheetah

cheetah_2_back

cheetah_2_front

cheetah_3_back

cheetah_3_balanced

cheetah_3_front

cheetah_4_allback

cheetah_4_allfront

cheetah_4_back

cheetah_4_front

cheetah_5_back

cheetah_5_balanced

cheetah_5_front

cheetah_6_back

cheetah_6_front

cheetah_7_full
Hopper

hopper_3

hopper_4

hopper_5

Note that each walker agent has an identical instance of itself called flipped, for which SMP always flips the torso message passed to both legs (e.g. the message that is passed to the left leg in the main instance is now passed the right leg).

For the results reported in the paper, the following agents are in the held-out set for the corresponding experiments:

All other agents in the corresponding experiments are used for training.

Acknowledgement

The TD3 code is based on this open-source implementation. The code for Dynamic Graph Neural Networks is adapted from Modular Assemblies (Pathak, Lu et al., NeurIPS 2019).