openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.34k stars 8.59k forks source link

timestep_limit of MountainCar-v0 #336

Closed falcondai closed 7 years ago

falcondai commented 7 years ago

Currently in the MountainCar-v0 environment, the timestep_limit is 200 which makes learning very difficult: most initial policies will run out of time before reaching the goal and end up receiving the same rewards (-200). Note that the solution threshold is -195 -110, i.e. reaching goal in 195 110 timesteps. I would suggest to increase this limit.

I notice that this time limit is only enforced when monitoring is on. I wonder why such limit is put into the monitoring which creates difference between a monitored/non-monitored environment. For performance comparison's sake, timestep counts might be a better measure.

tlbtlbtlb commented 7 years ago

I don't see -195 for a threshold anywhere: I believe it's -110.

Yes, the environment is hard and the timestep limit makes it harder. It's supposed to be challenging. Algorithms like https://gym.openai.com/evaluations/eval_DAj7EdpYTiO7m0H1f6xWw show that learning in this environment is possible.

You can enforce the timestep limit in your agent, or not if you want to experiment with longer trials. Most agents (such as the one linked above) do.

falcondai commented 7 years ago

thanks for the response. ahh, you are right, the reward is -110. hmm, interesting example submission. But the visualization on that submission seems off (strangely the plotted line didn't pass the threshold).

sanjaythakur commented 7 years ago

Hi, I was trying to raise the maximum steps per episode on Mountain Car environment. I used this

env = gym.make('MountainCar-v0') env.max_episode_steps = 500

But it still remain capped at 200. I also tried creating a new register entry, but it gave me some 'UnregisteredEnv' error. Can anyone give me some idea on how I should go about increasing the upper bound on episode size? Thanks!!

falcondai commented 7 years ago

you might notice that unlike many other environments, this environment MoutainCar-v0 allows you to continue to step even after an episode has ended: ignore the returned done value.

sanjaythakur commented 7 years ago

Well, it is not allowing me to continue calling 'step' function after the episode has taken 200 number of steps. It gives me the following error

raise error.ResetNeeded("Trying to step environment which is currently done. While the monitor is active for {}, you cannot step beyond the end of an episode. Call 'env.reset()' to start the next episode.".format(self.env_id)) gym.error.ResetNeeded: Trying to step environment which is currently done. While the monitor is active for MountainCar-v0, you cannot step beyond the end of an episode. Call 'env.reset()' to start the next episode.

So, it is binding me to call the 'reset' function. My problem is I am starting off taking random actions so as to explore the environment. However, 200 number of steps are turning out to be just too less for it too reach the goal and hence learn anything.

falcondai commented 7 years ago

i checked on the master branch (gym.__version__ = '0.7.4-dev') and it works fine without reset. as noted in the earlier discussion, it is possible to learn in this strict setting. MountainCar is a classic task to investigate the problem of exploration in RL. you are right that if an agent explore only by random actions, it is very unlikely to reach the goal in time since it would often undo its gained momentum. but that is exactly the issue of the so-called naive exploration.

sanjaythakur commented 7 years ago

Thanks for your replies. One of the ways worked. I edited 'init.py' under 'gym/envs/' to increase the maximum allowed steps in an episode. It does take effect immediately.

falcondai commented 7 years ago

@sanjaythakur i would recommend consulting example 8.2 in Reinforcement Learning An Introduction by Sutton and Barto for a principled treatment.

sanjaythakur commented 7 years ago

Yeah, I too feel making an informed decision based on planning would help more. Thanks, will do that.

tlbtlbtlb commented 7 years ago

If you modify the gym enviroment without changing the name, please don't submit any results to the scoreboard as it's not comparable with other people's scores.

sanjaythakur commented 7 years ago

@tlbtlbtlb , I'll keep that in mind.

shristi945 commented 6 years ago

@tlbtlbtlb Hi can you help me with this as I am new to open ai gym and have to create a new environment for autonomous drone hence defining _step() and _reset() fun in myenv class. this is the code for my environment env_code and I am getting these errors

env_error

Please help me with these errors and can you explain me about the argument action in the step function as we have to provide the action and the will return observation, reward and done so why we are giving action as an argument. It would be helpful if I could get a quick reply. Thanks in Advance

falcondai commented 6 years ago

@shristi945 for basic questions/discussion, you might want to consult https://discuss.openai.com/ first and reserve issues for more technical implementation-oriented things. So action is the action chosen by your agent operating in Env and environment changes depending on the action taken. Thus Env.step takes action. You can read more here and in various tutorials

shristi945 commented 6 years ago

@falcondai Thanx for informing me about where to discuss basics things. I have resolved my problem now.

raul-mdelfin commented 5 years ago

The problem is specifically designed to be hard for policies that try to get the answer randomly, and rewards the methods that go through exploration. If you increase the time limit, you are changing the environment, tho, solving another problem. The same case can be said for those who modify the reward function to achieve the solution of the problem

ZainBashir commented 5 years ago

Hi, I was trying to raise the maximum steps per episode on Mountain Car environment. I used this

env = gym.make('MountainCar-v0') env.max_episode_steps = 500

But it still remain capped at 200. I also tried creating a new register entry, but it gave me some 'UnregisteredEnv' error. Can anyone give me some idea on how I should go about increasing the upper bound on episode size? Thanks!!

Try this to initialize your environment: env = gym.make('MountainCar-v0').env

This increases the upper bound on the number of trails

When you visualize your learnt policy initialize your environment normally: env = gym.make('MountainCar-v0')

I don't know the reason yet but my learnt policy works correctly only if I initialize my environment the normal way Hope it works

QasimWani commented 4 years ago

If anyone needs any help, here's how you fix the TimeLimit error: env_name = "Taxi-v3" env = gym.make(env_name) env = env.unwrapped #gets ride of TimeLimit