Closed aryadas98 closed 3 years ago
I have updated the wiki https://github.com/openai/gym/wiki/FrozenLake-v0
for episode in range(num_episodes): env.reset() for step in range(max_steps): action = 0 state, _, done, _ = env.step(action) if done: terminal_states.add(state)
In you code you are recording every terminal state you find.
@aryadas98 This is intended behavior, by default, gym.make
initializes the environment with a time limit of 100 (https://github.com/openai/gym/blob/master/gym/envs/__init__.py#L150). If you want to avoid this, you can import the environment class and initialize it yourself, or register a different variant like in the linked file.
@jkterry1 closable
(in MDPs, any state is terminal if you're brave enough)
From what I understand,
env.step
returnsobservation, reward, done, info
.done
is supposed to indicate whether the agent reached the goal or fell into a hole (terminal states). But sometimes, it returns non-terminal states. It can be reproduced with this code:Expected output:
Actual output:
After some investigation, it appears that the environment returns
done
after it has stepped 100 times. Thus if I makemax_steps
any number less than 100, this issue does not happen. However, this behaviour is not mentioned in the documentations: https://gym.openai.com/envs/FrozenLake-v0/ https://github.com/openai/gym/wiki/FrozenLake-v0