pytorch / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
https://pytorch.org/rl
MIT License
2.2k stars 289 forks source link

[Feature Request] Refactor collectors to account for single vs multi-agent settings #939

Open vmoens opened 1 year ago

vmoens commented 1 year ago

Motivation

Batched envs already support multi-agent by allowing a single env constructor to be passed, or multiple env constructors:

# single agent
env = ParallelEnv(N, env_constructor)
# multi agent
env = ParallelEnv(N, [env_constructor_1, ..., env_constructor_N])

Proposition

We could do the same for mp and distributed data collectors. Passing a list of methods will cause the collector to consider this as being different envs, and use LazyStackedTensorDict under the hood (we could do a quick check and if tall the elements of the list have the same id we raise a warning and consider them as identical). If a single env is passed, we use an expanded tensordict as container (which is more memory efficient).

Changes

We would change the constructor of these classes by adding a num_env (or similar) kwarg that would match ParallelEnv and SerialEnv.

The collector and Env constructors would then be fairly similar.

cc @matteobettini @XuehaiPan @albertbou92

matteobettini commented 1 year ago

Batched envs already support multi-agent by allowing a single env constructor to be passed, or multiple env constructors:

# single agent
env = ParallelEnv(N, env_constructor)
# multi agent
env = ParallelEnv(N, [env_constructor_1, ..., env_constructor_N])

So I have a question here, maybe I am missing something, but how are multiple environment constructors related to multi-agent?

vmoens commented 1 year ago

You could have things less fancy than VMAS where one env controls one robot and another another robot, without having a single env class handling both. Seems like something that would cross people's mind, in MARL or multi-task (e.g. playing multiple atari games at the same time).

matteobettini commented 1 year ago

But then, in the MARL case, would the robots still share the same simulation world and interact with each other in it? I jsut never thought of a MARL environment as a list of single agent environments as the agents have to be entangled togheter and step syncronously.

vmoens commented 1 year ago

But then, in the MARL case, would the robots still share the same simulation world and interact with each other in it? I jsut never thought of a MARL environment as a list of single agent environments as the agents have to be entangled togheter and step syncronously.

I was considering a "real world" setting where each robot is an independent agent. It can communicate with others but through point to point communication (ie shared information would appear in each robot observation space). Not super optimal but somewhat realistic no?

matteobettini commented 1 year ago

I was considering a "real world" setting where each robot is an independent agent. It can communicate with others but through point to point communication (ie shared information would appear in each robot observation space). Not super optimal but somewhat realistic no?

In this real world setting, if two of the robots take actions that lead them into bumping into each other, will they observe the collision? I.e. robot 1 observation will be influenced by robot 2 obs?

If yes, they are sharing the env and we can use a list of their specs instead of a list of envs

If no, this is not a multi-agent case, but a single agent replicated N times (not MARL)

vmoens commented 1 year ago

Yeah in my imaginary case there is one tensor like "position" that is as big as the number of agents and shared among all of them with some latency. Anyhow, it's also needed for multi-task so even beyond this use case I think it'd be useful