openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.75k stars 8.61k forks source link

why gym(atari) has a different definition of game compared to ALE? #588

Closed shanyou92 closed 5 years ago

shanyou92 commented 7 years ago

Hi, I am using the gym(atari) and ALE at the same time. And I am playing atari games, for example breakout.

In gym, it's "breakout-v0", and it has 6 actions. But in ALE, using "breakout.bin" has 4 actions.

This is the same game????

tlbtlbtlb commented 7 years ago

I'm not sure where you see 6. I see 4:

>>> import gym
>>> gym.version.VERSION
'0.8.1'
>>> env=gym.make('Breakout-v0')
[2017-05-14 16:04:19,806] Making new env: Breakout-v0
>>> env
<TimeLimit<AtariEnv<Breakout-v0>>>
>>> env.action_space
Discrete(4)
>>> env.unwrapped.get_action_meanings()
['NOOP', 'FIRE', 'RIGHT', 'LEFT']
shanyou92 commented 7 years ago

Hi, @tlbtlbtlb It seems that a higher version of gym has a different definition?

import gym gym.version.VERSION '0.8.2' env=gym.make('Breakout-v0') [2017-05-15 11:34:41,915] Making new env: Breakout-v0 env.action_space Discrete(6) env.unwrapped.get_action_meanings() ['NOOP', 'FIRE', 'RIGHT', 'LEFT', 'RIGHTFIRE', 'LEFTFIRE']

tlbtlbtlb commented 7 years ago

For Breakout, 4 makes more sense because the fire button is only used to start the game. A learning agent will learn substantially faster without the extra actions. The difference in action space would be between versions of atari-py, not gym. You should have be using atari-py 0.1.1.

koulanurag commented 7 years ago

@tlbtlbtlb @JustinYou1992 as per my observation, Fire button is required every-time the agent loses a life.And, in the game of Breakout , the agent has 5 lives in a episodes as indicated in the "info".Also, the ball will simply disappear if we don't press "Fire" after the loss of a life.

This observation can be easily replicated by :

import gym import random

env = gym.make('Breakout-v0') [2017-05-16 21:38:47,888] Making new env: Breakout-v0 observation = env.reset() , , done, = env.step(1) # Fire while not done: ... env.render() ... , , done, = env.step(random.choice([2, 3])) # Right, Left ...

In this case, the episode never terminates.

For simple quick learning, old actions space used to give a scope of using "RightFire","LeftFire". This way "Fire" button was not needed.

tlbtlbtlb commented 7 years ago

That seems like a slight problem for hand-written agents, but a learning agent (the goal of gym) should be able to figure it out.

shanyou92 commented 7 years ago

@koulanurag @tlbtlbtlb In this way, how do we know which mode of game is used in current most papers? It seems unfair if different papers use different game configurations but report the final scores as usual...😂

tlbtlbtlb commented 7 years ago

One of the motivations for gym is to release environments that are at least consistent between researchers. We're planning to release, soon, wrappers that (we think) make them identical to the published work from DeepMind. I believe their results use a wrapper that clicks the fire button automatically to start each game. @shelhamer , do you know what action space they use for Breakout?

shelhamer commented 7 years ago

what action space they use for Breakout?

The original DQN and the Nature edition of the DQN made use of the xitari fork of ALE, which has this minimal action space for Breakout: https://github.com/deepmind/xitari/blob/master/games/supported/Breakout.cpp#L88-L91. Note that this is the same action space for Breakout as in the new v4 Atari gym environment: https://github.com/openai/atari-py/blob/master/atari_py/ale_interface/src/games/supported/Breakout.cpp#L83-L86.

shanyou92 commented 7 years ago

@shelhamer Thanks. I find when I update to atari-py-0.0.21, every version of Breakout in gym has 4 actions, including v0 and v4. But it is still kind of annoying since Breakout is just a single game, and we have at total 49+ games for research, which we have no idea whether they are also inconsistent with ALE environment with respect to #actions.

dgriff777 commented 7 years ago

Can we get rid of timelimits as well cause for a few Atari games definitely limits performance. Time limit I assume is to aid in training but we can limit that in training ourselves and lift limit when performance gets great enough that it's a handicap

christopherhesse commented 5 years ago

It sounds like the original issue here is fixed, please file a new issue if that is not the case.

@JustinYou1992 you can disable the minimum action with something like AtariEnv(game=game, obs_type='image', frameskip=1, full_action_space=True). There's still like 1 atari game with a different action space (skiing?) but the rest should all be the same

@dgriff777 You can create your environment using AtariEnv and there should be no timelimit.