openai / atari-py

A packaged and slightly-modified version of https://github.com/bbitmaster/ale_python_interface
GNU General Public License v2.0
373 stars 183 forks source link

Breakout doesn't end after losing the game #76

Closed familyld closed 4 years ago

familyld commented 4 years ago

Breakout

As mentioned in the title and demonstrated by the gif, after the paddle misses the ball’s rebound, the game still goes on. I check done returned by env.step(action) and it remains false. Is this normal? I am using BreakoutNoFrameskip-v0 and the code I use is as follows,

def eval_policy(policy, env, seed, eval_episodes=1):

    avg_reward = 0.
    for _ in range(eval_episodes):
        state, done = env.reset(), False
        while not done:
            env.env.render()
            action = policy.select_action(np.array(state), eval=True)
            state, reward, done, _ = env.step(action)
            avg_reward += reward
            # sleep(0.02)

        print(f"Evaluation over {steps} steps: {episode_reward:.3f}")

    avg_reward /= eval_episodes

    print("---------------------------------------")
    print(f"Evaluation over {eval_episodes} episodes: {avg_reward:.3f}")
    print("---------------------------------------")
    return avg_reward

where env is produced by,

# Create environment, add wrapper if necessary and create env_properties
def make_env(env_name, atari_preprocessing):
    env = gym.make(env_name)

    is_atari = gym.envs.registry.spec(env_name).entry_point == 'gym.envs.atari:AtariEnv'
    env = AtariPreprocessing(env, **atari_preprocessing) if is_atari else env

    state_dim = (
        atari_preprocessing["state_history"], 
        atari_preprocessing["frame_size"], 
        atari_preprocessing["frame_size"]
    ) if is_atari else env.observation_space.shape[0]

    return (
        env,
        is_atari,
        state_dim,
        env.action_space.n
    )

I am using atari-py==0.2.6 and gym==0.17.2.

JesseFarebro commented 4 years ago

In Breakout, after losing a life the game waits until your agent presses the "fire" action. Once that action is executed the ball drops. If the agent doesn't execute the "fire" action while waiting for the ball nothing will happen. Try running a random policy on Breakout, you'll see that things don't break.

In your case, perhaps your agent is being greedy and never will execute the "fire" action for the episode to continue. Try using an epsilon-greedy policy or increasing epsilon if you're already doing so.

familyld commented 4 years ago

@JesseFarebro Thanks a lot for your quick reply. I was wondering whether an action should be taken to end this game and your answer perfectly solves my question.