Open ChenyangRan opened 5 months ago
Hey, thanks for the question. It seems like a bug, I'll fix it.
I've only managed to solved the issue with some environments. I can't solve it for the others. The discussion is also ongoing here: https://github.com/Farama-Foundation/Gymnasium/issues/1111
To reproduce
import panda_gym
import gymnasium as gym
from gymnasium.utils.env_checker import data_equivalence
env = gym.make("PandaPickAndPlace-v3")
action_0 = env.action_space.sample()
action_1 = env.action_space.sample()
_, _ = env.reset(seed=10)
obs_00, _, _, _, _ = env.step(action_0)
_, _ = env.reset()
obs_01, _, _, _, _ = env.step(action_1)
_, _ = env.reset(seed=10)
obs_10, _, _, _, _ = env.step(action_0)
_, _ = env.reset()
obs_11, _, _, _, _ = env.step(action_1)
assert data_equivalence(obs_00, obs_10)
assert data_equivalence(obs_01, obs_11)
Hi, since the panda-gym cannot set the random seed as gym, where you can use env.set(seed) to reproduce the results. When I use env.reset(seed=10), if the random seed is the same, I get the same return value, such as desired_goal. But if I don't set the seed, I can't guarantee the consistency of the state of the subsequent env.resets. Is there a way to guarantee the consistency of the state of each subsequent reset just like gym?