Open vmoens opened 1 year ago
Batched envs already support multi-agent by allowing a single env constructor to be passed, or multiple env constructors:
# single agent env = ParallelEnv(N, env_constructor) # multi agent env = ParallelEnv(N, [env_constructor_1, ..., env_constructor_N])
So I have a question here, maybe I am missing something, but how are multiple environment constructors related to multi-agent?
You could have things less fancy than VMAS where one env controls one robot and another another robot, without having a single env class handling both. Seems like something that would cross people's mind, in MARL or multi-task (e.g. playing multiple atari games at the same time).
But then, in the MARL case, would the robots still share the same simulation world and interact with each other in it? I jsut never thought of a MARL environment as a list of single agent environments as the agents have to be entangled togheter and step syncronously.
But then, in the MARL case, would the robots still share the same simulation world and interact with each other in it? I jsut never thought of a MARL environment as a list of single agent environments as the agents have to be entangled togheter and step syncronously.
I was considering a "real world" setting where each robot is an independent agent. It can communicate with others but through point to point communication (ie shared information would appear in each robot observation space). Not super optimal but somewhat realistic no?
I was considering a "real world" setting where each robot is an independent agent. It can communicate with others but through point to point communication (ie shared information would appear in each robot observation space). Not super optimal but somewhat realistic no?
In this real world setting, if two of the robots take actions that lead them into bumping into each other, will they observe the collision? I.e. robot 1 observation will be influenced by robot 2 obs?
If yes, they are sharing the env and we can use a list of their specs instead of a list of envs
If no, this is not a multi-agent case, but a single agent replicated N times (not MARL)
Yeah in my imaginary case there is one tensor like "position" that is as big as the number of agents and shared among all of them with some latency. Anyhow, it's also needed for multi-task so even beyond this use case I think it'd be useful
Motivation
Batched envs already support multi-agent by allowing a single env constructor to be passed, or multiple env constructors:
Proposition
We could do the same for mp and distributed data collectors. Passing a list of methods will cause the collector to consider this as being different envs, and use
LazyStackedTensorDict
under the hood (we could do a quick check and if tall the elements of the list have the same id we raise a warning and consider them as identical). If a single env is passed, we use an expanded tensordict as container (which is more memory efficient).Changes
We would change the constructor of these classes by adding a num_env (or similar) kwarg that would match ParallelEnv and SerialEnv.
The collector and Env constructors would then be fairly similar.
cc @matteobettini @XuehaiPan @albertbou92