openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.47k stars 8.59k forks source link

[Bug Report] VectorEnv action space seed problem #2680

Closed vwxyzjn closed 2 years ago

vwxyzjn commented 2 years ago

Describe the bug envs.action_space.sample() is not reproducible after seeds.

Code example

import gym
import numpy as np
def make_env(env_id, seed):
    def thunk():
        env = gym.make(env_id)
        env.seed(seed)
        env.action_space.seed(seed)
        env.observation_space.seed(seed)
        return env

    return thunk
seed = 1
envs = gym.vector.SyncVectorEnv([make_env("CartPole-v1", seed)])
sampled_actions = []
for _ in range(20):
    sampled_actions += [envs.action_space.sample()]
print(np.array(sampled_actions).flatten())
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[1 0 0 1 0 1 1 0 1 1 1 1 1 0 0 1 0 1 0 0]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[0 1 1 1 0 0 1 0 1 0 0 0 1 1 0 0 1 1 1 1]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[0 1 1 0 1 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[0 0 0 1 1 1 1 1 0 1 1 0 1 1 0 1 0 0 0 0]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[0 1 0 1 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 0]

I think the problem is https://github.com/openai/gym/pull/2280 batches a new unseeded action space... So if I seed this action space again it will work:

import gym
import numpy as np
def make_env(env_id, seed):
    def thunk():
        env = gym.make(env_id)
        env.seed(seed)
        env.action_space.seed(seed)
        env.observation_space.seed(seed)
        return env

    return thunk
seed = 1
envs = gym.vector.SyncVectorEnv([make_env("CartPole-v1", seed)])
envs.action_space.seed(seed)
envs.observation_space.seed(seed)
sampled_actions = []
for _ in range(20):
    sampled_actions += [envs.action_space.sample()]
print(np.array(sampled_actions).flatten())
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[1 1 0 1 0 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[1 1 0 1 0 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[1 1 0 1 0 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[1 1 0 1 0 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ python w.py
[1 1 0 1 0 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0]
(gym-vSAllIMi-py3.9) ➜  gym git:(master) ✗ 

But this may seem unintuitive to the users. I am not sure how to resolve this, @tristandeleu What do you think?

Checklist

jkterry1 commented 2 years ago

An additional concern is that this managed to get through CI tests

RedTachyon commented 2 years ago

Off the top of my head I don't really know how to fix this, but I think it's pretty low severity, right? Since it's a relatively niche feature, and there's a pretty simple way of doing this "correctly". It might be enough to just mention it in the documentation that batched spaces don't preserve their RNG.

Also, what exactly is the intended behavior here? After seeding a space and using it in a vectorized env, do we expect all environments's spaces to have the same RNG?

vwxyzjn commented 2 years ago

Yeah, this is a weird one. Something I can think of is the seed the batched space with the seed from the sub environment space if the seed is set. If there are multiple different seeds from the sub environment spaces probably just take the first one’s seed…?

tristandeleu commented 2 years ago

The second option (seeding the observation_space and action_space of VectorEnv, instead of the individual environments) should be the preferred one, since a VectorEnv really is just like any environment. This could be made clearer in the documentation.

As @RedTachyon said, there is no easy way to fix this unfortunately. The only way I could see seeding being enforced is if we had a way to know the seed of an instantiated space, which is impossible with the current API; the only way to have access to seeding information is when we call space.seed().

As a side note (this was extensively discussed in #2422), if reproducibility is critical, I would suggest defining (and seeding) a random policy instead, using PyTorch/Jax/Numpy/TF. You'd be losing the ease to sample from any kind of space (barely, it should be easy to write code that works for any space), but you would gain a lot of control over what you want to be fully reproducible.

RedTachyon commented 2 years ago

Should we include a disclaimer like "This method is meant to serve as a quick and dirty way of getting a valid action, and should not be used in situations where reproducibility is desirable" around space.sample()?

jkterry1 commented 2 years ago

I like that disclaimer

pseudo-rnd-thoughts commented 2 years ago

Looking at creating a solution for this, the problem is in gym.vector.utils.spaces as when a batch space is created then the new space, a seed is not provided based on the original space. However, the problem is this seed value needs to be an integer value that can't be reversed

Therefore, I have modified the space.init seed parameter to allow a raw seeding.RandomNumberGenerator object Using this modification, the first env provided to a sync vector env is used for the action space seeding

I will create a PR with this solution to allow review and comments

Edit: PR is https://github.com/openai/gym/pull/2727

An additional concern is that this managed to get through CI tests

There doesn't appear to be any tests for the batch space seeds, Im adding that as part of the PR

pseudo-rnd-thoughts commented 2 years ago

@vwxyzjn Thanks has been solved https://github.com/openai/gym/pull/2727, could you close please