pytorch / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
https://pytorch.org/rl
MIT License
2.36k stars 312 forks source link

[Feature Request] Gym tuple space mapping #1435

Open smorad opened 1 year ago

smorad commented 1 year ago

Motivation

TorchRL cannot handle environments with gymnasium.spaces.Tuple observation spaces. I think these are fairly common outside of MuJoCo/Atari envs.

Solution

Support for tuple spaces

Additional context

Traceback (most recent call last):
  File "/Users/smorad/code/rqdn/segment_dqn.py", line 92, in <module>
    GymWrapper(load_popgym_env(config), device=device),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/smorad/code/from_src/torchrl/torchrl/envs/libs/gym.py", line 371, in __init__
    super().__init__(**kwargs)
  File "/Users/smorad/code/from_src/torchrl/torchrl/envs/common.py", line 1388, in __init__
    self._make_specs(self._env)  # writes the self._env attribute
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/smorad/code/from_src/torchrl/torchrl/envs/libs/gym.py", line 522, in _make_specs
    observation_spec = _gym_to_torchrl_spec_transform(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/smorad/code/from_src/torchrl/torchrl/envs/libs/gym.py", line 192, in _gym_to_torchrl_spec_transform
    raise NotImplementedError("gym.spaces.tuple.Tuple mapping not yet implemented")
NotImplementedError: gym.spaces.tuple.Tuple mapping not yet implemented

Checklist

vmoens commented 1 year ago

On it! Do you have an example env?

smorad commented 1 year ago

Sure, here's a nested noop env that should cover most use/test cases:

import numpy as np
import gymnasium as gym
from gymnasium.spaces import Discrete, Tuple, Box, Dict

class TestEnv(gym.Env):
    def __init__(self):
        self.observation_space = Tuple((
            Discrete(2), 
            Box(low=0, high=1, shape=(2,), dtype=np.float32),
            Tuple((
                Dict({"bork": Box(low=-10, high=10, shape=(1,), dtype=np.float32)}),
                Discrete(3),
            )),
        ))
        self.action_space = gym.spaces.Discrete(2)

    def step(self, action):
        return self.observation_space.sample(), 0, False, False, {}

    def reset(self):
        return self.observation_space.sample(), {}
In [2]: e = TestEnv()

In [3]: e.observation_space.sample()
Out[3]: 
(0,
 array([0.67096233, 0.22732651], dtype=float32),
 (OrderedDict([('bork', array([4.789896], dtype=float32))]), 2))
vmoens commented 1 year ago

Right so as of now what is clear is that this is not clear :) The core of the problem is that we do not carry tuples of data in torchrl / tensordict, but we work with dictionaries and dictionaries only.

  1. We could treat that as a LazyStacked Spec but then we should have some lazy stacked tensor that we don't have (and probably don't want) since we need a datastructure that carries any list of tensors / tensordicts and allows for indexing, reshaping etc like we do in tensordict.
  2. We could use a "regular" tensordict with keys "0", "1" etc although it's not straightforward.
  3. Another option is to "force" users to convert their tuples in something else: we can provide a tuple conversion tool and give a proper error message that explains what to do.
  4. We could add an option in the conversion function to allow people to choose between stacking and writing keys in a tensordict. I'm not very enthusiast about that because it is yet another option in the env construction and it will mean that multiple users can work with the same env but with data structures completely different. Also on a high level, it's gonna be hard to say "I want keys for this tuple and shape for this other"
  5. My favourite is this: with tuples, we try to stack. If stacking is not possible (specs do not have the same type, dtype, device, or number of dims) then an error is raised. This error more or less looks like this: "Conversion of tuple spec to torchrl spec failed. This is likely due to heterogeneous specs in your env. To remedy, please use the conversion tool torchrl.envs.convert_gym_tuple_spec."

Thoughts?

cc @matteobettini

matteobettini commented 1 year ago

Yes i am aligned with 5. AKA try to create a lazystack from what we have (this will be represented by a lazystackedspec (either composite or normal)). If this does not succeed we tell users that the only option left is to convert the tuple in a dict with int positional keys and we provide a tool for that

this way we will be able to process any tuple of dicts always

vmoens commented 1 year ago

What i'm not sure is how to handle that conversion.

matteobettini commented 1 year ago

My take is we provide the tool to convert the specs for people that are only using those in their custom simulators, but for gym envs we convert both specs and data.

If you use only the tool to convert specs then you must implement the data conversion yourself.

TLDR the gym wrapper converts specs and data, but it uses a public util to convert specs that can be used by other components

vmoens commented 1 year ago

Currently the spec conversion isn't public. Making it public may require a bit of woodwork

matteobettini commented 1 year ago

Why not make it public?

vmoens commented 1 year ago

same reason as any other private class or function public = no deprecation, slow pace of development, robust testing needed (no more "this corner case will never happen") The idea initially was that we were providing one and only one way to wrap a gym env in torchrl so there was no need to provide the tooling to convert the specs. Still today i'm not sure of that that would mean: why converting the specs without an env? Is it as common as that to have a lib that does not use gym as a backend but uses its specs (like VMAS)? If so, shouldn't each simulator use its own conversion? Not sure why a random user should care about converting specs from gym to torchrl. This is what puts me in an uncomfortable position with this tuple: there does not seem to be a proper way of automatically do the conversion and I worry that we're more and more getting to a stage where using basic features of the lib requires a deep dive in the lib classes. We're loosing the nice GymWrapper(any_gym_env) that is compelling for many users.

smorad commented 1 year ago

If I'm understanding this correctly, item 5 would not be very helpful. If the observations are stackable, they would be represented as a MultiDiscrete(stack_dim * [feat_dim]) or Box(shape(stack_dim, feat_dim)) rather than a tuple.

Tuples are used primarily when the data types do not align. For example a pose Box(shape=(6,)) and a collision sensor Discrete(2). Or an Atari image Box((width, height, colour)) and a score Box((1,)).

Could we perhaps combine 2 and 3 in a Transform? It could represent the tuples as dicts with keys _0, _1, .... This way it is explicit and prevents polluting the GymWrapper futher.

There is also gymnasium.spaces.utils.flatten and gymnasium.spaces.utils.flatten_space which could be useful, as these will eliminate tuples and replace them with a large Box.

vmoens commented 1 year ago

Then why don't people name them {"pose": Box, "collision_sensor": Discrete}? From a good engineering practice point of view, it seems a bit odd to provide 2 ways of doing the same thing no?

matteobettini commented 1 year ago

There is also gymnasium.spaces.utils.flatten and gymnasium.spaces.utils.flatten_space which could be useful, as these will eliminate tuples and replace them with a large Box.

this is basically what we want to when stacking the tuple, it would not only allow to go from Box(shape(feat_dim)) to Box(shape(stack_dim, feat_dim)) but also allow stacking boxes with same ndims but different feature dims Box(shape(stack_dim, -1)). we would also be able to stack any dicts independently of their keys

This is what puts me in an uncomfortable position with this tuple: there does not seem to be a proper way of automatically do the conversion and I worry that we're more and more getting to a stage where using basic features of the lib requires a deep dive in the lib classes. We're loosing the nice GymWrapper(any_gym_env) that is compelling for many users.

I share the same feeling. It has always been a struggle point for us to fit lists in this library

smorad commented 1 year ago

Then why don't people name them {"pose": Box, "collision_sensor": Discrete}? From a good engineering practice point of view, it seems a bit odd to provide 2 ways of doing the same thing no?

Gym was (and still is) a poorly designed interface in many regards -- it should have kept all iterables as dictionaries. Don't even get me started on how they "flatten" the action spaces. Personally, the RL libraries I was using ages ago had issues representing dicts but could represent tuples, which is why I went with tuples for my envs.

I think one of the big draws for RL libraries is handling crazy IO specs. It takes me significantly longer to write code handling the gym specs than it does to implement DQN or PPO losses. I believe it is the most frustrating and error-prone part of RL.

vmoens commented 1 year ago

I agree, hence I'm more in favour of a restrictive, clear env API than letting users do everything in every possible way. The way we try to do things at PyTorch is to give users one way of solving a problem, but not two (the "try" in this sentence is there on purpose :) ) To me, there are 2 use cases:

To wrap it up, I think we should provide a solution that is generic. My feeling is that we should:

To come back to whether or not we should expose the gym spec conversion tool (@matteobettini) I think not. As I've said earlier, it is not clear to me where that would be needed: if you have a gym env and it's properly defined, that conversion should be done automatically. If you have something that is not gym but with gym specs, it isn't very clear what the specs mean in general as you may be doing anything you want with them so I would not expose the conversion API, that would be unsafe.