openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.82k stars 8.61k forks source link

[Frostbite] env doesn't return done=True on death, but goes into "Demo Play" mode #1539

Closed artofbeinghuman closed 5 years ago

artofbeinghuman commented 5 years ago

Hello, I have stumbled upon a peculiar thing with the FrostbiteNoFrameskip-v4 environment. Consider the following code snippet, where I run the env indefinitely, at each step giving the 0-th action, which according to env.get_action_meanings() is NOOP, meaning the agent will do nothing.

import gym
env = gym.make("FrostbiteNoFrameskip-v4")
ob = env.reset()
while True:
    _, _, done, _ = env.step(0)
    env.render()
    if done:
        break

As expected the agent stands around doing nothing, until he freezes to death, upon which one life is deducted. This goes on until he runs out of lives. Then, it would be expected, that the final env.step(0) returns done=True, such that I can break from the game. However, this does not happen and instead the environment goes into a mode, which I can only describe as "Demo Play", like it would showcase the game in a video. I will add a screenshot of this. In this "Demo Mode" the agent stays indefinitely, dying several times, without losing lives and while also not gaining any points. Screenshot from 2019-06-18 18-15-29

If we change the above toy example and let the agent go downwards all the time (env.step(5)), then upon dying, the environment sends done=True and the script quits the while loop successfully.

What is going on?

Best, Marvin

artofbeinghuman commented 5 years ago

The problem seems to be, that the agent doesn't get his last life discounted in info = {'ale.lives': 1} when freezing to death. So since at no point info == {'ale.lives': 0}, the environment also doesn't return done = True. However, if the agent dies by drowing (if you supply the down action at every step) then upon dying in his last life, ale.lives is set to 0 and done = True is returned.

Can anybody with a bit more experience say, if this has to be fixed in gym or is it actually a problem in the underlying ALE?

Thanks, Marvin

christopherhesse commented 5 years ago

It looks like gym just calls game_over which calls isTerminal on the environment (https://github.com/mgbellemare/Arcade-Learning-Environment/blob/f7fff8733c8cc0f54d749ddeaf29bd7f478d6f0f/src/games/supported/Frostbite.cpp#L61). This certainly looks like a bug, just not a bug in gym, could you please file it on the ALE repo? https://github.com/mgbellemare/Arcade-Learning-Environment/issues