schroederdewitt / multiagent_mujoco

Benchmark for Continuous Multi-Agent Robotic Control, based on OpenAI's Mujoco Gym environments.
Apache License 2.0
330 stars 34 forks source link

Observations are mapped to each agent but what about each agent's actions? #15

Open PBarde opened 2 years ago

PBarde commented 2 years ago

I have trouble understanding where the list of action’s vector for each agent (that you pass to the MujocoMulti env ) is reassembled into the single agent Mujoco env action vector to match the correct actuators. For example, from line https://github.com/schroederdewitt/multiagent_mujoco/blob/97eab01fcff0313f1a1c275115c10616988145a3/src/multiagent_mujoco/mujoco_multi.py#L111

it seems that the multi-agent action list is simply flattened and then passed to the Mujoco single agent env. I do not see how this could handle both the 2-Agent Ant and 2-Agent Ant Diag setups. If we look at Figure 4 of the FACMAC paper, in Figure 4 H and I we have:

2-Agent Ant (Figure 4 H):

MA action list = [blue agent, green agent] = [[a1, a2, a5, a6], [a3, a4, a7, a8]]

Flattened single agent action = [a1, a2, a5, a6, a3, a4, a7, a8]

2-Agent Ant Diag (Figure 4 I):

MA action list = [blue agent, green agent] = [[a3, a4, a5, a6], [a1, a2, a7, a8]]

Flattened single agent action = [a3, a4, a5, a6, a1, a2, a7, a8]

We see that the action vectors passed to the single agent mujoco env do not correspond to the same actuators.

I think that this corresponds to agents observing one limb but controlling another.

Am I missing something here?