it seems that the multi-agent action list is simply flattened and then passed to the Mujoco single agent env. I do not see how this could handle both the 2-Agent Ant and 2-Agent Ant Diag setups. If we look at Figure 4 of the FACMAC paper, in Figure 4 H and I we have:
2-Agent Ant (Figure 4 H):
MA action list = [blue agent, green agent] = [[a1, a2, a5, a6], [a3, a4, a7, a8]]
I have trouble understanding where the list of action’s vector for each agent (that you pass to the MujocoMulti env ) is reassembled into the single agent Mujoco env action vector to match the correct actuators. For example, from line https://github.com/schroederdewitt/multiagent_mujoco/blob/97eab01fcff0313f1a1c275115c10616988145a3/src/multiagent_mujoco/mujoco_multi.py#L111
it seems that the multi-agent action list is simply flattened and then passed to the Mujoco single agent env. I do not see how this could handle both the 2-Agent Ant and 2-Agent Ant Diag setups. If we look at Figure 4 of the FACMAC paper, in Figure 4 H and I we have:
2-Agent Ant (Figure 4 H):
MA action list = [blue agent, green agent] = [[a1, a2, a5, a6], [a3, a4, a7, a8]]
Flattened single agent action = [a1, a2, a5, a6, a3, a4, a7, a8]
2-Agent Ant Diag (Figure 4 I):
MA action list = [blue agent, green agent] = [[a3, a4, a5, a6], [a1, a2, a7, a8]]
Flattened single agent action = [a3, a4, a5, a6, a1, a2, a7, a8]
We see that the action vectors passed to the single agent mujoco env do not correspond to the same actuators.
I think that this corresponds to agents observing one limb but controlling another.
Am I missing something here?