Closed vwxyzjn closed 2 years ago
An additional concern is that this managed to get through CI tests
Off the top of my head I don't really know how to fix this, but I think it's pretty low severity, right? Since it's a relatively niche feature, and there's a pretty simple way of doing this "correctly". It might be enough to just mention it in the documentation that batched spaces don't preserve their RNG.
Also, what exactly is the intended behavior here? After seeding a space and using it in a vectorized env, do we expect all environments's spaces to have the same RNG?
Yeah, this is a weird one. Something I can think of is the seed the batched space with the seed from the sub environment space if the seed is set. If there are multiple different seeds from the sub environment spaces probably just take the first one’s seed…?
The second option (seeding the observation_space
and action_space
of VectorEnv
, instead of the individual environments) should be the preferred one, since a VectorEnv
really is just like any environment. This could be made clearer in the documentation.
As @RedTachyon said, there is no easy way to fix this unfortunately. The only way I could see seeding being enforced is if we had a way to know the seed of an instantiated space, which is impossible with the current API; the only way to have access to seeding information is when we call space.seed()
.
As a side note (this was extensively discussed in #2422), if reproducibility is critical, I would suggest defining (and seeding) a random policy instead, using PyTorch/Jax/Numpy/TF. You'd be losing the ease to sample from any kind of space (barely, it should be easy to write code that works for any space), but you would gain a lot of control over what you want to be fully reproducible.
Should we include a disclaimer like "This method is meant to serve as a quick and dirty way of getting a valid action, and should not be used in situations where reproducibility is desirable" around space.sample()
?
I like that disclaimer
Looking at creating a solution for this, the problem is in gym.vector.utils.spaces as when a batch space is created then the new space, a seed is not provided based on the original space. However, the problem is this seed value needs to be an integer value that can't be reversed
Therefore, I have modified the space.init seed parameter to allow a raw seeding.RandomNumberGenerator object Using this modification, the first env provided to a sync vector env is used for the action space seeding
I will create a PR with this solution to allow review and comments
Edit: PR is https://github.com/openai/gym/pull/2727
An additional concern is that this managed to get through CI tests
There doesn't appear to be any tests for the batch space seeds, Im adding that as part of the PR
@vwxyzjn Thanks has been solved https://github.com/openai/gym/pull/2727, could you close please
Describe the bug
envs.action_space.sample()
is not reproducible after seeds.Code example
I think the problem is https://github.com/openai/gym/pull/2280 batches a new unseeded action space... So if I seed this action space again it will work:
But this may seem unintuitive to the users. I am not sure how to resolve this, @tristandeleu What do you think?
Checklist