schroederdewitt / multiagent_mujoco

Benchmark for Continuous Multi-Agent Robotic Control, based on OpenAI's Mujoco Gym environments.
Apache License 2.0
334 stars 34 forks source link

Maintained Fork

The maintained version of these environments, which include numerous fixes, comprehensive documentation, support for installation via pip, and support for current versions of Python are available in Gymnasium Robotics (https://github.com/Farama-Foundation/Gymnasium-Robotics, https://robotics.farama.org/)

Multi-Agent Mujoco

Benchmark for Continuous Multi-Agent Robotic Control, based on OpenAI's Mujoco Gym environments.

Described in the paper Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control by Christian Schroeder de Witt, Bei Peng, Pierre-Alexandre Kamienny, Philip Torr, Wendelin Böhmer and Shimon Whiteson, Torr Vision Group and Whiteson Research Lab, University of Oxford, 2020

Installation

Note: You require OpenAI Gym Version 0.10.8 and Mujoco 2.1

Simply clone this repository and put ./src on your PYTHONPATH. To render, please also set the following environment variables:

LD_LIBRARY_PATH=${HOME}/.mujoco/mujoco210/bin;
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so

Example

from multiagent_mujoco.mujoco_multi import MujocoMulti
import numpy as np
import time

def main():
    env_args = {"scenario": "HalfCheetah-v2",
                  "agent_conf": "2x3",
                  "agent_obsk": 0,
                  "episode_limit": 1000}
    env = MujocoMulti(env_args=env_args)
    env_info = env.get_env_info()

    n_actions = env_info["n_actions"]
    n_agents = env_info["n_agents"]
    n_episodes = 10

    for e in range(n_episodes):
        env.reset()
        terminated = False
        episode_reward = 0

        while not terminated:
            obs = env.get_obs()
            state = env.get_state()

            actions = []
            for agent_id in range(n_agents):
                avail_actions = env.get_avail_agent_actions(agent_id)
                avail_actions_ind = np.nonzero(avail_actions)[0]
                action = np.random.uniform(-1.0, 1.0, n_actions)
                actions.append(action)

            reward, terminated, _ = env.step(actions)
            episode_reward += reward

            time.sleep(0.1)
            env.render()

        print("Total reward in episode {} = {}".format(e, episode_reward))

    env.close()

if __name__ == "__main__":
    main()

Documentation

Environment config

Extending Tasks

Tasks can be trivially extended by adding entries in src/multiagent_mujoco/obsk.py.

Task configuration

Unless stated otherwise, all the parameters given below are to be used with .multiagent_mujoco.MujocoMulti.

2-Agent Ant

env_args.scenario="Ant-v2"
env_args.agent_conf="2x4"
env_args.agent_obsk=1

2-Agent Ant Diag

env_args.scenario="Ant-v2"
env_args.agent_conf="2x4d"
env_args.agent_obsk=1

4-Agent Ant

env_args.scenario="Ant-v2"
env_args.agent_conf="4x2"
env_args.agent_obsk=1

2-Agent HalfCheetah

env_args.scenario="HalfCheetah-v2"
env_args.agent_conf="2x3"
env_args.agent_obsk=1

6-Agent HalfCheetah

env_args.scenario="HalfCheetah-v2"
env_args.agent_conf="6x1"
env_args.agent_obsk=1

3-Agent Hopper

env_args.scenario="Hopper-v2"
env_args.agent_conf="3x1"
env_args.agent_obsk=1

2-Agent Humanoid

env_args.scenario="Humanoid-v2"
env_args.agent_conf="9|8"
env_args.agent_obsk=1

2-Agent HumanoidStandup

env_args.scenario="HumanoidStandup-v2"
env_args.agent_conf="9|8"
env_args.agent_obsk=1

2-Agent Reacher

env_args.scenario="Reacher-v2"
env_args.agent_conf="2x1"
env_args.agent_obsk=1

2-Agent Swimmer

env_args.scenario="Swimmer-v2"
env_args.agent_conf="2x1"
env_args.agent_obsk=1

2-Agent Walker

env_args.scenario="Walker2d-v2"
env_args.agent_conf="2x3"
env_args.agent_obsk=1

Manyagent Swimmer

env_args.scenario="manyagent_swimmer"
env_args.agent_conf="10x2"
env_args.agent_obsk=1

Manyagent Ant

env_args.scenario="manyagent_ant"
env_args.agent_conf="2x3"
env_args.agent_obsk=1

Coupled HalfCheetah (NEW!)

env_args.scenario="coupled_half_cheetah"
env_args.agent_conf="1p1"
env_args.agent_obsk=1

CoupledHalfCheetah features two separate HalfCheetah agents coupled by an elastic tendon. You can add more tendons or novel coupled scenarios by

  1. Creating a new Gym environment to define the reward function of the coupled scenario (consult coupled_half_cheetah.py)
  2. Create a new Mujoco environment XML file to insert agents and tendons (see assets/coupled_half_cheetah.xml)
  3. Register your env as a scenario in the MujocoMulti environment (only if you need special default observability params)