Closed ardian-selmonaj closed 1 month ago
The default implementation in DI-engine for MARL is to share the same policy for different agents. This scheme is commonly used in multi-agent cooperation task. You can start from the existing algorithms in DI-engine like QMIX/MAPPO. Here is a simple example on the pettingzoo simple spread environment.
After training, if you want to extract/export the learned policy, you can find the PyTorch model state_dict
in the saved checkpoint.
I have two questions, for which I could not yet find how to do it:
1) how can I extract specific policies out of a trained multi-agent model? e.g. when 3 agents were trained, "agent_1", "agent_2" and "agent_3", I would like to have the prediction only of a specific agent.
2) how to configure MARL training s.th. multiple agents share the same poilcy? E.g. "agent_1" and "agent_2" use the same policy
pi_1
.Ray RLlib supports this, but they have much less algorithms than Di-engine, therefore I would highly appreciate to have these functionalities here. Thank you!