env.action_space.sample() doesn't follow env.seed() ?

openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.

https://www.gymlibrary.dev

Other

34.76k stars 8.61k forks source link

env.action_space.sample() doesn't follow env.seed() ? #681

Closed tylerlekang closed 7 years ago

tylerlekang commented 7 years ago

When I set env.seed(0) (or some other seed) I expected all random elements of env to produce deterministically. However, the env.action_space.sample() function still seems to output randomly.

a1 = []
a2 = []

env1 = gym.make('FrozenLake-v0')
env1.seed(0)

s1 = env1.reset()

for _ in range(4):
    a1.append(env1.action_space.sample())

env2 = gym.make('FrozenLake-v0')
env2.seed(0)

s2 = env2.reset()

for _ in range(4):
    a2.append(env2.action_space.sample())

print a1
print a2

produces different results for a1 and a2. For example:

[1, 0, 2, 2]
[0, 3, 2, 1]

Perhaps this was/is desired, but as mentioned above, I thought that setting env.seed() would override that.

fjwolski commented 7 years ago

see in gym source code how do spaces sample; e.g. https://github.com/openai/gym/blob/339415aa03a9b039a51f67798a44f8cd21464091/gym/spaces/box.py#L28-L29 they use separate random number generator that lives in gym.spaces.prng. If you want action / observation space to sample deterministically you will need to

from gym.spaces.prng import seed
seed(123)

tylerlekang commented 7 years ago

OK, thanks for that info.

I was questioning if that should be the case, given a seemingly "overarching" nature of a simple line like env.seed(). BUT, if that is the way they want it to be done (or perhaps how it has to be done), I'm fine with that.

jfaleiro commented 5 years ago

For newer versions use env.action_space.np_random.seed(123) - depending on the specific environment you might need env.seed(123) for a deterministic behavior.

orrp commented 2 years ago

Is there a reason why env.action_space.np_random.seed is not set by env.seed?

RedTachyon commented 2 years ago

@orrp I think the main reason is that env.action_space.np_random doesn't really have to be used often, and probably shouldn't be used for actual algorithms anyways, so it's only initialized when it's actually necessary. We already have the method env.action_space.seed()

Oh, and env.seed is deprecated now

DominikRoB commented 2 years ago

@RedTachyon Can you point me to a place where the current handling of seeding is described?

I was just about to set up env.seed just to see here its deprecated?

stefanbschneider commented 2 years ago

For newer versions use env.action_space.np_random.seed(123) - depending on the specific environment you might need env.seed(123) for a deterministic behavior.

This means that any space can be seeded independent of an environment. Spaces optionally also take a seed during construction.