Closed maciejors closed 11 months ago
Good idea, I played with environment seeding a while back while implementing random states in PPO. One thing I've noticed back then was that even with seeded reset and seeded agent only the first episode was fully reproducible. For some reason subsequent episodes started to diverge more and more overtime. I remember reading something about wind power (I should mention I was using LunarLander) not using the seed passed to reset but now I can't find it so I'm most likely misremembering something.
Anyway we should still investigate what the issue was.
It's probably not going to be an issue with minigrid but I'm a bit concerned about Atari games since they use emulated console. Hopefully gym environment seed is passed down to the emulation somehow.
We've already got a
random_state
parameter for agents but to make experiments fully reproducible, we need to add a similar parameter to environments.Gymnasium environments can have a
seed
parameter passed toreset()
method (example). We need to figure out how to pass it in a way that doesn't make these environments identical each time they're reset.