The maintained version of these environments, which include numerous fixes, comprehensive documentation, support for installation via pip, and support for current versions of Python are available in Gymnasium Robotics (https://github.com/Farama-Foundation/Gymnasium-Robotics, https://robotics.farama.org/)
Benchmark for Continuous Multi-Agent Robotic Control, based on OpenAI's Mujoco Gym environments.
Described in the paper Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control by Christian Schroeder de Witt, Bei Peng, Pierre-Alexandre Kamienny, Philip Torr, Wendelin Böhmer and Shimon Whiteson, Torr Vision Group and Whiteson Research Lab, University of Oxford, 2020
Note: You require OpenAI Gym Version 0.10.8 and Mujoco 2.1
Simply clone this repository and put ./src on your PYTHONPATH. To render, please also set the following environment variables:
LD_LIBRARY_PATH=${HOME}/.mujoco/mujoco210/bin;
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so
from multiagent_mujoco.mujoco_multi import MujocoMulti
import numpy as np
import time
def main():
env_args = {"scenario": "HalfCheetah-v2",
"agent_conf": "2x3",
"agent_obsk": 0,
"episode_limit": 1000}
env = MujocoMulti(env_args=env_args)
env_info = env.get_env_info()
n_actions = env_info["n_actions"]
n_agents = env_info["n_agents"]
n_episodes = 10
for e in range(n_episodes):
env.reset()
terminated = False
episode_reward = 0
while not terminated:
obs = env.get_obs()
state = env.get_state()
actions = []
for agent_id in range(n_agents):
avail_actions = env.get_avail_agent_actions(agent_id)
avail_actions_ind = np.nonzero(avail_actions)[0]
action = np.random.uniform(-1.0, 1.0, n_actions)
actions.append(action)
reward, terminated, _ = env.step(actions)
episode_reward += reward
time.sleep(0.1)
env.render()
print("Total reward in episode {} = {}".format(e, episode_reward))
env.close()
if __name__ == "__main__":
main()
Tasks can be trivially extended by adding entries in src/multiagent_mujoco/obsk.py.
Unless stated otherwise, all the parameters given below are to be used with .multiagent_mujoco.MujocoMulti
.
env_args.scenario="Ant-v2"
env_args.agent_conf="2x4"
env_args.agent_obsk=1
env_args.scenario="Ant-v2"
env_args.agent_conf="2x4d"
env_args.agent_obsk=1
env_args.scenario="Ant-v2"
env_args.agent_conf="4x2"
env_args.agent_obsk=1
env_args.scenario="HalfCheetah-v2"
env_args.agent_conf="2x3"
env_args.agent_obsk=1
env_args.scenario="HalfCheetah-v2"
env_args.agent_conf="6x1"
env_args.agent_obsk=1
env_args.scenario="Hopper-v2"
env_args.agent_conf="3x1"
env_args.agent_obsk=1
env_args.scenario="Humanoid-v2"
env_args.agent_conf="9|8"
env_args.agent_obsk=1
env_args.scenario="HumanoidStandup-v2"
env_args.agent_conf="9|8"
env_args.agent_obsk=1
env_args.scenario="Reacher-v2"
env_args.agent_conf="2x1"
env_args.agent_obsk=1
env_args.scenario="Swimmer-v2"
env_args.agent_conf="2x1"
env_args.agent_obsk=1
env_args.scenario="Walker2d-v2"
env_args.agent_conf="2x3"
env_args.agent_obsk=1
env_args.scenario="manyagent_swimmer"
env_args.agent_conf="10x2"
env_args.agent_obsk=1
env_args.scenario="manyagent_ant"
env_args.agent_conf="2x3"
env_args.agent_obsk=1
env_args.scenario="coupled_half_cheetah"
env_args.agent_conf="1p1"
env_args.agent_obsk=1
CoupledHalfCheetah
features two separate HalfCheetah agents coupled by an elastic tendon. You can add more tendons or novel coupled scenarios by
coupled_half_cheetah.py
)assets/coupled_half_cheetah.xml
)