Closed ericl closed 5 years ago
This is great; does this make creating a centralized value function work instantaneously?
Additional questions, do the agents now have a group id and an agent id? The line
"qmix" if agent_id == "group1" else "dqn"
makes it seem as though the agent ids no longer exist?
Kind of -- you end up with everything centralized but you can use a custom model to ensure execution is actually distributed. I guess this is a useful starting point vs everything decentralized by default.
Describe the problem
It is common to have groups of agents in multi-agent RL, which can have either centralized or decentralized training and execution. While in principle this can be modeled already by either (1) implementing the grouping in the env, or (2) having policies that have shared layers / losses, there are a couple problems with this:
Having to change the env definition to define groups makes it harder to switch between different approaches. Ideally the env definition is fixed and grouping can be defined as part of the training config.
From a code simplicity perspective sharing layers across policy graph objects is confusing (especially considering how it interacts with batching). It is more straightforward to write a group policy as in pymarl: https://github.com/oxwhirl/pymarl_alpha/blob/master/src/controllers/basic_controller.py
The implementation could be pretty simple: just an auto-generated env wrapper that merges agent observations into a Tuple of observations.
Possible API: