pytorch / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
https://pytorch.org/rl
MIT License
2.33k stars 309 forks source link

[Feature Request] Vectorized/multi-agent environments compatibilitiy issues #777

Closed matteobettini closed 1 year ago

matteobettini commented 1 year ago

Motivation

Vectorized environments are environments that perform simulations using batches. This can be useful to benefit from parallel computation on GPUs. These environments have their own batch_sizes, which can be used for different reasons.

For example:

Currently, torchrl environment infrastructure has some issues with environemnts which have non-empty batch sizes or that have a batch dimension for agents.

Ideally, we would like to use vectorized environments freely in torch rl and leverage its features such as ParallelEnv and Collectors on top of such environments. This whould create tensordicts with many dimensions in the batch_size, for example:

tensordict.batch_size = (
    n_parallel_envs, # from ParallelEnv
    n_agents, # from env.batch_size
    n_vectorized_envs, # from env.batch_size
    *other_env_dimensions, # from env.batch_size
    n_rollout_samples # from env.rollout()
)

I created this issue to list and organize all the issues that need to be addressed in order to generalize to BaseEnvs with general batch sizes in torchrl:

Issues

Stacking tensordicts of hetergoeneous shapes and nestedtensors compatibility (#766)(PR)

When some of the dimensions of the vectorized enironment are heterogenous (agents with different observation and action spaces that stil share the other batch dimensions), we need to carry this heterogeneous data in a suitable data straucture.

NestedTensors provide a natural candidate for this task. Here is a list of the operations that need to be supported by NestedTensors in order to enable this feature:

Heterogeneous CompositeSpec (#766)(PR #829)

Bug on how ParallelEnv sets the batch_size (#773)(PR #774)

Bug on using sorted() on CompositeSpec keys (#775)(PR #787)

Hangling of the done flag when it has arbitrary dimensions (#776)(PR #788)

The _reset() method needs to be able to know which dimensions and indexes to reset (#790)(PR #800)

Collectors crash with enviornments with non-empty batch_size (#807)(PR #828)

vmoens commented 1 year ago

For the last one (_reset should know the batch size) we could just pass an empty TensorDict instance. Wdyt?

matteobettini commented 1 year ago

For the last one (_reset should know the batch size) we could just pass an empty TensorDict instance. Wdyt?

There might be use cases where only some of the dimensions of the vector have to be reset. For example, the done flag can state that only some simulations in the vector have to be reset.

This is why methods such as reset_at() exist in rllib VectorEnv (https://github.com/ray-project/ray/blob/master/rllib/env/vector_env.py#L104)

vmoens commented 1 year ago

We have that in ParallelEnv through a "resent_workers" key IIRC. We could make a reset_at helper that writes the Boolean mask in the tensordict.

matteobettini commented 1 year ago

We have that in ParallelEnv through a "resent_workers" key IIRC. We could make a reset_at helper that writes the Boolean mask in the tensordict.

Exactly, a key like that can be used in the reset() function of BaseEnv and instead of limiting to the worker dimensions it spans over all the batch_dim of the env.

If this key is not present the default could be reset all dims