Open smorad opened 1 year ago
On it! Do you have an example env?
Sure, here's a nested noop env that should cover most use/test cases:
import numpy as np
import gymnasium as gym
from gymnasium.spaces import Discrete, Tuple, Box, Dict
class TestEnv(gym.Env):
def __init__(self):
self.observation_space = Tuple((
Discrete(2),
Box(low=0, high=1, shape=(2,), dtype=np.float32),
Tuple((
Dict({"bork": Box(low=-10, high=10, shape=(1,), dtype=np.float32)}),
Discrete(3),
)),
))
self.action_space = gym.spaces.Discrete(2)
def step(self, action):
return self.observation_space.sample(), 0, False, False, {}
def reset(self):
return self.observation_space.sample(), {}
In [2]: e = TestEnv()
In [3]: e.observation_space.sample()
Out[3]:
(0,
array([0.67096233, 0.22732651], dtype=float32),
(OrderedDict([('bork', array([4.789896], dtype=float32))]), 2))
Right so as of now what is clear is that this is not clear :) The core of the problem is that we do not carry tuples of data in torchrl / tensordict, but we work with dictionaries and dictionaries only.
Thoughts?
cc @matteobettini
Yes i am aligned with 5. AKA try to create a lazystack from what we have (this will be represented by a lazystackedspec (either composite or normal)). If this does not succeed we tell users that the only option left is to convert the tuple in a dict with int positional keys and we provide a tool for that
this way we will be able to process any tuple of dicts always
What i'm not sure is how to handle that conversion.
My take is we provide the tool to convert the specs for people that are only using those in their custom simulators, but for gym envs we convert both specs and data.
If you use only the tool to convert specs then you must implement the data conversion yourself.
TLDR the gym wrapper converts specs and data, but it uses a public util to convert specs that can be used by other components
Currently the spec conversion isn't public. Making it public may require a bit of woodwork
Why not make it public?
same reason as any other private class or function
public = no deprecation, slow pace of development, robust testing needed (no more "this corner case will never happen")
The idea initially was that we were providing one and only one way to wrap a gym env in torchrl so there was no need to provide the tooling to convert the specs. Still today i'm not sure of that that would mean: why converting the specs without an env? Is it as common as that to have a lib that does not use gym as a backend but uses its specs (like VMAS)? If so, shouldn't each simulator use its own conversion?
Not sure why a random user should care about converting specs from gym to torchrl.
This is what puts me in an uncomfortable position with this tuple: there does not seem to be a proper way of automatically do the conversion and I worry that we're more and more getting to a stage where using basic features of the lib requires a deep dive in the lib classes. We're loosing the nice GymWrapper(any_gym_env)
that is compelling for many users.
If I'm understanding this correctly, item 5 would not be very helpful. If the observations are stackable, they would be represented as a MultiDiscrete(stack_dim * [feat_dim])
or Box(shape(stack_dim, feat_dim))
rather than a tuple.
Tuples are used primarily when the data types do not align. For example a pose Box(shape=(6,))
and a collision sensor Discrete(2)
. Or an Atari image Box((width, height, colour))
and a score Box((1,))
.
Could we perhaps combine 2 and 3 in a Transform
? It could represent the tuples as dicts with keys _0, _1, ...
. This way it is explicit and prevents polluting the GymWrapper
futher.
There is also gymnasium.spaces.utils.flatten
and gymnasium.spaces.utils.flatten_space
which could be useful, as these will eliminate tuples and replace them with a large Box
.
Then why don't people name them {"pose": Box, "collision_sensor": Discrete}
?
From a good engineering practice point of view, it seems a bit odd to provide 2 ways of doing the same thing no?
There is also
gymnasium.spaces.utils.flatten
andgymnasium.spaces.utils.flatten_space
which could be useful, as these will eliminate tuples and replace them with a largeBox
.
this is basically what we want to when stacking the tuple, it would not only allow to go from Box(shape(feat_dim))
to Box(shape(stack_dim, feat_dim))
but also allow stacking boxes with same ndims but different feature dims Box(shape(stack_dim, -1))
. we would also be able to stack any dicts independently of their keys
This is what puts me in an uncomfortable position with this tuple: there does not seem to be a proper way of automatically do the conversion and I worry that we're more and more getting to a stage where using basic features of the lib requires a deep dive in the lib classes. We're loosing the nice
GymWrapper(any_gym_env)
that is compelling for many users.
I share the same feeling. It has always been a struggle point for us to fit lists in this library
Then why don't people name them
{"pose": Box, "collision_sensor": Discrete}
? From a good engineering practice point of view, it seems a bit odd to provide 2 ways of doing the same thing no?
Gym was (and still is) a poorly designed interface in many regards -- it should have kept all iterables as dictionaries. Don't even get me started on how they "flatten" the action spaces. Personally, the RL libraries I was using ages ago had issues representing dicts but could represent tuples, which is why I went with tuples for my envs.
I think one of the big draws for RL libraries is handling crazy IO specs. It takes me significantly longer to write code handling the gym specs than it does to implement DQN or PPO losses. I believe it is the most frustrating and error-prone part of RL.
I agree, hence I'm more in favour of a restrictive, clear env API than letting users do everything in every possible way. The way we try to do things at PyTorch is to give users one way of solving a problem, but not two (the "try" in this sentence is there on purpose :) ) To me, there are 2 use cases:
cat
and dog
when you're writing code, it's confusing and as you're saying error prone. If one is an image, it should be named "image"
or "pixels"
.
In TorchRL, we decided to handle the first by indexing tensordicts along a shape dimension and the second along a key dimension. I thought deeply about introducing a new class that would be some stack of whatever data is given to it but it goes against what I stated above and I feel we'd be saying something like "we propose tensordict to do things in this way, but if you want to do it in a different way it's fine too", which weakens the message and eventually breaks the logic.To wrap it up, I think we should provide a solution that is generic. My feeling is that we should:
To come back to whether or not we should expose the gym spec conversion tool (@matteobettini) I think not. As I've said earlier, it is not clear to me where that would be needed: if you have a gym env and it's properly defined, that conversion should be done automatically. If you have something that is not gym but with gym specs, it isn't very clear what the specs mean in general as you may be doing anything you want with them so I would not expose the conversion API, that would be unsafe.
Motivation
TorchRL cannot handle environments with
gymnasium.spaces.Tuple
observation spaces. I think these are fairly common outside of MuJoCo/Atari envs.Solution
Support for tuple spaces
Additional context
Checklist