Closed ultmaster closed 2 years ago
We haven't adapted to the newest version of gym in tianshou.
Union[ObsType, tuple[ObsType, dict]]
The tuple is for multi-agent because of my suggestion.
However, it is a huge change. It may cause many existing customized environments not work. So I personally think it's better to wait for a while to support the newest feature (probably when gym 1.0 release?).
I'm asking this because I want to put some extra information (e.g., collect logs) on reset.
However, I didn't realize it's a new feature recently added by gym. In that case, there's no hurry.
I think the proper way is to support both. Can we add an extra flag to determine if the env returns only obs or a tuple of (obs, info) at first? It needs to change the collector as well.
Another solution is to add an environment wrapper:
class ForceReturnInfoAtReset(gym.Wrappers):
def reset(self, **kwargs):
result = self.env.reset(**kwargs)
if isinstance(result, tuple):
return result
return result, {} # fake info
And change all tianshou's code to (obs, info).
I think tianshou doesn't need to support this natively. Rather, it needs to have the interface to let users implement similar features (e.g., #322) on their own, in a natural way.
For vector env, it should support arbitrary input arguments and return values. It can have the assumption that the first return value of the tuple must be obs, but I think other assumptions are not necessary.
For collector, I think the current collector implementation is a bit messy. Maybe collect
method needs some breakdown, so that users can customize their own collectors easily.
For whatever good my input does here:
I'm largely fine with not using the new changes until the 1.0 release comes out (or more specifically what we're planning to call the 0.5.0 release where all the breaking changes to the base API will become permanently enabled, the 1.0 release will have additional work on wrappers, hardware accelerated environments and vector environments). However, the breaking changes that aren't enabled by default really need to be updated and the minimum version of Gym needs to be updated, per https://github.com/thu-ml/tianshou/pull/613. Perhaps leaving this PR open for a few more months until the 0.5.0 release comes out and working to get #613 merged makes sense?
Gym environment API supports seed, return_info, and options in reset:
https://github.com/openai/gym/blob/dcae553204957a75db5e4169e33547ccac599609/gym/core.py#L93
But I find the information lost in tianshou:
https://github.com/thu-ml/tianshou/blob/e01385ea30d21b7efc3f9377d22d83e6afd39e2f/tianshou/env/worker/subproc.py#L90 https://github.com/thu-ml/tianshou/blob/e01385ea30d21b7efc3f9377d22d83e6afd39e2f/tianshou/env/worker/dummy.py#L34
Also the return type is
Union[ObsType, tuple[ObsType, dict]]
in gym, but onlyObsType
in tianshou.