Closed tongzhoumu closed 5 years ago
Do you get the same issue if you use 'FreewayNoFrameskip-v4'
?
Do you get the same issue if you use
'FreewayNoFrameskip-v4'
?
Yes, if I manually skip 4 frames, it also happens.
Hmm, why does it only happen if you manually skip 4 frames?
Hmm, why does it only happen if you manually skip 4 frames?
No, it will happen on both Deterministic-v4
AND NoFrameskip-v4
.
For Deterministic-v4
, try this:
import gym
import copy
import numpy as np
actions = [1, 2, 0, 0, 1, 0, 1, 0, 0, 0, 1, 2, 2, 1, 1, 2, 1, 0, 2, 1, 1, 0, 1, 2, 2, 1, 1, 0, 2, 1, 0, 0, 1, 0, 2, 2, 2, 2, 2, 0, 1, 2, 1, 2, 1, 1, 2, 1, 1, 2, 0, 2, 0, 2, 2, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0]
final_ob_in_last_game = None
count = 0
while True:
count += 1
print('Game', count)
env = gym.make('FreewayDeterministic-v4')
env.reset()
for action in actions:
ob, reward, done, _ = env.step(action)
if final_ob_in_last_game is not None and not ((final_ob_in_last_game == ob).all()):
print(np.nonzero(final_ob_in_last_game - ob))
final_ob_in_last_game = copy.deepcopy(ob)
For NoFrameskip-v4
, try this:
import gym
import copy
import numpy as np
game = 'Freeway'
actions = [1, 2, 0, 0, 1, 0, 1, 0, 0, 0, 1, 2, 2, 1, 1, 2, 1, 0, 2, 1, 1, 0, 1, 2, 2, 1, 1, 0, 2, 1, 0, 0, 1, 0, 2, 2, 2, 2, 2, 0, 1, 2, 1, 2, 1, 1, 2, 1, 1, 2, 0, 2, 0, 2, 2, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0]
final_ob_in_last_game = None
count = 0
while True:
count += 1
print('Game', count)
env = gym.make(game+'NoFrameskip-v4')
env.reset()
for action in actions:
for _ in range(4):
ob, reward, done, _ = env.step(action)
if final_ob_in_last_game is not None and not ((final_ob_in_last_game == ob).all()):
# print(np.nonzero(final_ob_in_last_game - ob))
print('Different observation!')
final_ob_in_last_game = copy.deepcopy(ob)
Ah, I didn't realize it was so sensitive to the exact action sequence.
It looks like you're not seeding the environment, so it uses a random seed, here's a version that calls seed:
import gym
game = 'Freeway'
actions = [1, 2, 0, 0, 1, 0, 1, 0, 0, 0, 1, 2, 2, 1, 1, 2, 1, 0, 2, 1, 1, 0, 1, 2, 2, 1, 1, 0, 2, 1, 0, 0, 1, 0, 2, 2, 2, 2, 2, 0, 1, 2, 1, 2, 1, 1, 2, 1, 1, 2, 0, 2, 0, 2, 2, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0]
final_ob_in_last_game = None
count = 0
while True:
count += 1
print('Game', count)
env = gym.make(game+'NoFrameskip-v4')
env.seed(0)
env.reset()
for action in actions:
for _ in range(4):
ob, _, _, _ = env.step(action)
if final_ob_in_last_game is not None and not ((final_ob_in_last_game == ob).all()):
print('Different observation!')
final_ob_in_last_game = ob.copy()
env.close()
Does it still happen if you seed the environment?
Ah, I didn't realize it was so sensitive to the exact action sequence.
It looks like you're not seeding the environment, so it uses a random seed, here's a version that calls seed:
import gym game = 'Freeway' actions = [1, 2, 0, 0, 1, 0, 1, 0, 0, 0, 1, 2, 2, 1, 1, 2, 1, 0, 2, 1, 1, 0, 1, 2, 2, 1, 1, 0, 2, 1, 0, 0, 1, 0, 2, 2, 2, 2, 2, 0, 1, 2, 1, 2, 1, 1, 2, 1, 1, 2, 0, 2, 0, 2, 2, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0] final_ob_in_last_game = None count = 0 while True: count += 1 print('Game', count) env = gym.make(game+'NoFrameskip-v4') env.seed(0) env.reset() for action in actions: for _ in range(4): ob, _, _, _ = env.step(action) if final_ob_in_last_game is not None and not ((final_ob_in_last_game == ob).all()): print('Different observation!') final_ob_in_last_game = ob.copy() env.close()
Does it still happen if you seed the environment?
Hi, your code works. However, I find that I need to seed the environment each time before reset(), otherwise it still not entirely deterministic. For example, if I only seed the env once,
import gym
game = 'Freeway'
actions = [1, 2, 0, 0, 1, 0, 1, 0, 0, 0, 1, 2, 2, 1, 1, 2, 1, 0, 2, 1, 1, 0, 1, 2, 2, 1, 1, 0, 2, 1, 0, 0, 1, 0, 2, 2, 2, 2, 2, 0, 1, 2, 1, 2, 1, 1, 2, 1, 1, 2, 0, 2, 0, 2, 2, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0]
env = gym.make(game+'NoFrameskip-v4')
env.seed(0)
final_ob_in_last_game = None
count = 0
while True:
count += 1
print('Game', count)
env.reset()
for action in actions:
for _ in range(4):
ob, _, _, _ = env.step(action)
if final_ob_in_last_game is not None and not ((final_ob_in_last_game == ob).all()):
print('Different observation!')
final_ob_in_last_game = ob.copy()
env.close()
Does that mean env.reset() doesn't full reset the environment?
In general no, it doesn't reset the RNG state, so you have to call env.seed(0)
before each reset, as you pointed out. This is standard behavior on environments with random initial starting conditions, which apparently includes ALE games in gym.
It's confusing that the "Deterministic" version of an environment is not actually deterministic.
Got it. Thank you!
I think "Deterministic-v4" is a deterministic version of any atari games. However, I found that FreewayDeterministic-v4 is not deterministic. I execute a fixed action sequence some times but I cannot get exact same observation at each time. The following code can reproduce it. My gym version is 0.12.1, and my python version is 3.5.2
Followup: I just find that "FrostbiteDeterministic-v4" is also not deterministic.