Closed keskival closed 2 years ago
The lack of symmetry is odd. But if we look at step()
as actually obs = observe()
and rew, done, info = act()
combined, then reset()
is reset()
(which has no return value) + observe()
.
I think changing this might break the current implementation of vecenvs, since a reset step would need to return (pre-reset info, post-reset info).
info
is generally not meant to be part of the observation. If you can't represent your observation using a gym.Space
you can just return some other object and not obey the space interface (though of course this means that anything that requires a valid space to work will not work with your environment).
Thank you for the quick response!
I appreciate the clarification of the purpose of the info return item. Currently the documentation states as follows:
"info (dict): diagnostic information useful for debugging. It can sometimes be useful for learning (for example, it might contain the raw probabilities behind the environment’s last state change). However, official evaluations of your agent are not allowed to use this for learning."
This is given in association to other observation information, with observation, reward and done. But as you say, it specifically mentions state change which associates it to act.
I still think the interfaces should be symmetric, or that info should at least be retrievable from the initial state through the vectorized environment interface, because there is no other good way to access such debugging relevant state information in the environment when one does not want to incorporate it to the observation space for various reasons.
How about __getattr__
method in vectorized env, that could query all sub-environments and returns a list of results?
in fact, I think there is already a PR to that extent: https://github.com/openai/gym/pull/1600/files (maybe it is reasonable to limit its scope to read-only properties)
@vwxyzjn @cpnota opinions on what to do with this?
I agree that it should have been designed to return info
to begin with, but obviously at this point changing the core API is a bad idea.
Regarding #1600 , it seems potentially useful but I have no strong opinions.
I don't have a strong opinion on this, but would prefer to keep things backward-compatible if we make this change (e.g. do gym.reset(info=True)
). Also, if you want to get the relevant diagnostics / info, you could always just call a function from the env.
I like the idea of gym.reset(info=True)
I'd be pretty happy to take on this issue. It seems like the most obvious change to make is to add the optional argument in the abstract reset function in the Env class. I think this won't break any existing implementations of the Env
class. Also I think the default value for the optional info
argument to the reset function should be False
rather than True
; if we default to True
, then existing code which uses existing environments will break because the default behavior of the reset function will be to return two objects instead of one. I think the slight asymmetry that will be retained by having info default to False is probably worthwhile in terms of not breaking backwards compatibility.
Best I can tell though, the presence of optional arguments in the Env
class is really just a style guide for each environment, so the core of this effort will actually be modifying all the environments to offer this functionality. In theory this shouldn't be a breaking change for any environment, or have any influence on environment behavior or determinism; so I'm interested to hear if this would be an okay change to make to all the environments without incrementing their version number.
If this all seems reasonable, feel free to assign this issue to me, and I'll start working on a pull request.
Hey, I can't assign people due to an issue with github, but I'll have @RedTachyon review your plan and let you know
So in general I like the idea, but there is one part that worries me a little bit, specifically the fact that with this approach, the return signature of env.reset
changes depending on the input. Which would be doable (although mildly annoying) with some algebraic data type in a sanely typed language, and in python it can get by through its duck typing. At the same time, I'm not sure if there's a better way to do it at the moment.
To be precise, my problem is with the fact that sometimes you'd need to do
obs = env.reset(...)
and sometimes
obs, info = env.reset(...)
I'd actually be in favor of moving towards the latter as the default behavior e.g. in a 1.0 release. Aside from people getting angry at any changes, I think it's a sane design choice, and the 1.0 version mark might make them complain a bit less.
Also as a word of warning - this will likely be a bit more tricky to properly implement across gym than you expect due to the existence of vector environments. The easy part will be passing the info
argument into each individual environment. The more difficult part will be combining all the info
outputs (although there should be code to do the same thing in step
, so it'd probably be a matter of copying it and adjusting a little bit)
I'm not changing the default of this before a 1.0 release and we have to add functionality for this so Atari can return seeds with the new PR. We can revisit changing the default with 1.0
Sounds good @balisujohn
My two cents regarding #2546: Have you considered making this option class-wide? Because some may not prefer calling reset(info=false)` every time, and in light of a potential breaking change this would make the codes more transparent.
Another reason is that in many gym wrappers (e.g., dm_env, TFAgents, etc.) reset()
is assumed to take no arguments, which means that in order to retrieve info
from reset()
, the downstream wrappers need to catch up with this change. However if returning dict
were possible by changing arguments a constructor, we can achieve the same behavior without requiring those wrappers to be updated.
@wookayin The intention is not that they'd call obs = env.reset(info=False)
, but that they'd call obs, info = env.reset()
, or obs, _ = env.reset()
if they don't need the info.
And yes, the downstream code will have to catch up with the 1.0 release when the default API is changed. Creating a compatibility wrapper isn't hard though, so they can do that, or maybe we'll even include it in a future release. Or they can do maintenance on the libraries they're maintaining.
This has been merged into master as a flag to step
Are we sure we want to make a breaking change? This is going to break all the previous codes that are using gym. I'm worried this is too harsh, and I suggest we keep the previous API. I still think there is a way to keep the previous behavior as a default one, but env.reset()
would return a tuple when configured so (like env.reset(return_info=True)
or return_info=True
passed to the constructor).
UPDATE: It looks like the comment by @RedTachyon was simply misleading, #2546 is indeed NOT making any breaking changes. On 0.22.0 and 1.x, env.reset()
will still return a single observation as return_info
is by default False
. I guess you probably have meant env.reset(return_info=True)
for the latter (or without the argument if config were class-wide, if implemented).
@wookayin My comment referred to what this will look like in the future, when we do intend to change the default API. In 0.xx, the defaults will remain the way they are right now. In 1.x, there are no such promises.
This change broke a lot of code.
All my codes are not working now
To be symmetric with the observation returned from the step() method, I think reset() method should also return info for arbitrary information about the game state.
This would be useful for certain games where non-Space compliant information about the game state could be returned, especially for the subprocess vectorized environments where the environment internal state is not otherwise easily accessible by the agent.
Sure, in principle all observed state should be projected to the observation space, but that is not always possible, which is why I assume returned "info" exists in the step() method in the first place.