Investigate how the performance of the MAPPO algorithm is affected by adding the agent's ID, the agent's unique identifier, to the global state, that is shared by all agents.
To investigate this, we will use two different ways of initializing the wrappers for the MAPPO systems:
The first way is to apply on the environment's wrapper the AgentIDWrapper first, which adds the agent's ID to the observation of each agent, and then the GlobalStateWrapper, which concatenates the observations of all agents to form the global state.
The second way is to reverse the order of these wrappers, and apply the GlobalStateWrapper first and then the AgentIDWrapper.
Add an embedding layer for encoding the agent's IDs with a fixed size instead of a one-hot encoding vector and try the two above approaches.
NB: In the new make_env.py file we will adopt the second approach since the agent IDs are represented as one-hot encoding vectors, which can become excessively large as the number of agents increases (e.g., in SMAC).
Definition of done
This investigation can be considered complete once experiments have been carried out in the two above-mentioned cases, a decision has been taken on the method to be adopted, and it has been verified which method can contribute to improving the results.
What was the conclusion of your investigation?
A primary conclusion: In the first method, where the AgentIDWrapper is applied first, it was observed that adding the agent ID in the local observation before using all agents' observations as a global state introduced significant noise. The agent IDs are represented as one-hot encoding vectors, which can become excessively large as the number of agents increases (e.g., in SMAC). Interestingly, in testing and comparing the two methods, it was found that the second approach, applying the GlobalStateWrapper first, followed by the AgentIDWrapper, yielded better results and this order allowed agents to learn more efficiently. As a result, we have decided to adopt the second approach for our work FOR NOW.
Final conclusion and decision: TBD.
Checklist
[x] Run experiments with the two above methods.
[x] Compare the obtained results with one-hot vector IDs.
[ ] Implement the third approach.
[ ] Compare the first results with the third approach.
What do you want to investigate?
Investigate how the performance of the MAPPO algorithm is affected by adding the agent's ID, the agent's unique identifier, to the global state, that is shared by all agents.
To investigate this, we will use two different ways of initializing the wrappers for the MAPPO systems:
AgentIDWrapper
first, which adds the agent's ID to the observation of each agent, and then theGlobalStateWrapper
, which concatenates the observations of all agents to form the global state.GlobalStateWrapper
first and then theAgentIDWrapper
.NB: In the new make_env.py file we will adopt the second approach since the agent IDs are represented as one-hot encoding vectors, which can become excessively large as the number of agents increases (e.g., in SMAC).
Definition of done
This investigation can be considered complete once experiments have been carried out in the two above-mentioned cases, a decision has been taken on the method to be adopted, and it has been verified which method can contribute to improving the results.
What was the conclusion of your investigation?
A primary conclusion: In the first method, where the
AgentIDWrapper
is applied first, it was observed that adding the agent ID in the local observation before using all agents' observations as a global state introduced significant noise. The agent IDs are represented as one-hot encoding vectors, which can become excessively large as the number of agents increases (e.g., in SMAC). Interestingly, in testing and comparing the two methods, it was found that the second approach, applying theGlobalStateWrapper
first, followed by theAgentIDWrapper
, yielded better results and this order allowed agents to learn more efficiently. As a result, we have decided to adopt the second approach for our work FOR NOW.Final conclusion and decision: TBD.
Checklist