openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.76k stars 8.61k forks source link

Frozen Lake returns done on non-terminal states #2170

Closed aryadas98 closed 3 years ago

aryadas98 commented 3 years ago

From what I understand, env.step returns observation, reward, done, info. done is supposed to indicate whether the agent reached the goal or fell into a hole (terminal states). But sometimes, it returns non-terminal states. It can be reproduced with this code:

import gym

env = gym.make("FrozenLake-v0")
env.reset()

max_steps = 100
num_episodes = 10000

terminal_states = set()

for episode in range(num_episodes):
    env.reset()

    for step in range(max_steps):

        action = 0

        state, _, done, _ = env.step(action)

        if done:
            terminal_states.add(state)

print(terminal_states)

Expected output:

{12}

Actual output:

{0, 4, 8, 12}

After some investigation, it appears that the environment returns done after it has stepped 100 times. Thus if I make max_steps any number less than 100, this issue does not happen. However, this behaviour is not mentioned in the documentations: https://gym.openai.com/envs/FrozenLake-v0/ https://github.com/openai/gym/wiki/FrozenLake-v0

aryadas98 commented 3 years ago

I have updated the wiki https://github.com/openai/gym/wiki/FrozenLake-v0

shuruiz commented 3 years ago
for episode in range(num_episodes):
    env.reset()

    for step in range(max_steps):

        action = 0

        state, _, done, _ = env.step(action)

        if done:
            terminal_states.add(state)

In you code you are recording every terminal state you find.

RedTachyon commented 3 years ago

@aryadas98 This is intended behavior, by default, gym.make initializes the environment with a time limit of 100 (https://github.com/openai/gym/blob/master/gym/envs/__init__.py#L150). If you want to avoid this, you can import the environment class and initialize it yourself, or register a different variant like in the linked file.

@jkterry1 closable

(in MDPs, any state is terminal if you're brave enough)