Closed vwxyzjn closed 2 years ago
I'd suggest only using None if necessary because it's hard to either directly convert the array into tensor, or generate data from venv executor. In general, tianshou's Batch has already proposed a potential solution like:
In [27]: from tianshou.data import Batch
In [28]: a = Batch(episode={'l': 75, 'r': 75.0, 't': 36.77, 'd': "abcd"})
In [29]: b = Batch(terminal_obs=0.2097)
In [30]: Batch.stack([a, b])
Out[30]:
Batch(
terminal_obs: array([0. , 0.2097]),
episode: Batch(
l: array([75, 0]),
r: array([75., 0.]),
t: array([36.77, 0. ]),
d: array(['abcd', None], dtype=object),
),
)
I'm currently looking at this; for wrappers like RecordEpisodeStatistics
the most reasonable approach to me seems the first proposal
{
"episode": {
"r": [75.0, None, None, None, None],
"l": [75, None, None, None, None],
"t": [36.770559, None, None, None, None],
},
"terminal_observation": [
array([0.2097086, -0.5355723, -0.21343598, -0.11173592], dtype=float32),
None,
None,
None,
None,
]
}
Since there are a lot of things to work on I think it is better to agree on the new API before starting to make changes.
Also the solution proposed by @Trinkle23897 seems good, my only doubt is that using 0's can be misleading regarding the fact that it is a "real" zero or a zero from the fact that there is no info for the env.
@pseudo-rnd-thoughts @jkterry1
There is a third option which I suspect to be inefficient however removes the need for no-data data, i.e. 0
or None
.
That is pairing the info data with the environment number.
This I think would be efficient for sparse environment info but not for dense
{
"key": [
(data, environment number)
...
],
...
}
or
{
"key": {
environment number: data,
...
},
...
}
The primary problem is that it is not memory efficient as it doesn't compress the NumPy data into a single array however if you don't need to store the data then this is fine.
Otherwise, I think that Brax and tianshou have very similar implementations that we could follow making gym work with their modules easier.
Edit: One of my questions is what is the application of needed information from info, particularly given that truncation is moved from info to step return. FYI, I'm not an experienced person with distributed RL
Closing in favor of PR
Motivation
The current vector env API would return an
info
dict per environment. So if if I have 5 envs, then the dict could look like[{}, {}, {}, {}, {"truncated": True}]
While this API is intuitive, it's not a friendly interface to high-throughput environments such as brax (also see https://github.com/google/brax/issues/164) and envpool. In these envs, they prefer the following paradigm: instead of 5 dictionaries, it's just a single dictionary with an array of length 5.
{"truncated":[False, False, False, False, True]}
Pros and Cons
Brax and envpool's approach feels more ergonomic for high-throughput envs, whereas Gym's approach is maybe more efficient for "sparse" info keys that don't appear often (I could be wrong with this).
Proposal
I propose to change the
info
API for vectorized environments to adapt brax's current design forinfo
. This might also have implications for wrappers and other APIs. For example, the current vectorized environment API would returninfo
like this (with theRecordEpisodeStatistics
used):I am not sure what's the best way to handle this... Maybe something like the following?
Alternative 1
Alternatively, it's possible to do this in a very hacky way: brax or env pool could always override the first sub-env's info dict to always provide their desired info so something like:
[{"truncated":[False, False, False, False, True]}, {}, {}, {}, {}]
This way, we don't have to change the current API, however, it's extremely hacky and unintuitive.
Alternative 2
Alternatively, maybe
info
should be reserved for something sparse likeepisode
andterminal_observation
. And for more frequent infos they should be accessed via attributes (i.e.,envs.get_attr(name)
(enabled with https://github.com/openai/gym/pull/1600). So something likeWhat are your thoughts @RedTachyon @JesseFarebro, @tristandeleu @XuehaiPan @Trinkle23897 @erikfrey @araffin @Miffyli
Checklist