Pendulum done = True after 200 steps

openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.

https://www.gymlibrary.dev

Other

34.77k stars 8.61k forks source link

Pendulum done = True after 200 steps #1437

Closed briankim13 closed 5 years ago

briankim13 commented 5 years ago

Hello all,

I found pendulum environment returns true for done after 200 steps. Maybe documentation should be changed (it says it will return false all the time)?

But I don't understand why I get done=True.... Below is the code from pendulum.py for me. I shouldn't get True at all

def step(self,u):
    th, thdot = self.state # th := theta

    g = 10.
    m = 1.
    l = 1.
    dt = self.dt

    u = np.clip(u, -self.max_torque, self.max_torque)[0]
    self.last_u = u # for rendering
    costs = angle_normalize(th)**2 + .1*thdot**2 + .001*(u**2)

    newthdot = thdot + (-3*g/(2*l) * np.sin(th + np.pi) + 3./(m*l**2)*u) * dt
    newth = th + newthdot*dt
    newthdot = np.clip(newthdot, -self.max_speed, self.max_speed) #pylint: disable=E1111

    self.state = np.array([newth, newthdot])
    return self._get_obs(), -costs, False, {}

muupan commented 5 years ago

That is because it is wrapped by the TimeLimit wrapper when you call gym.make.

https://github.com/openai/gym/blob/master/gym/wrappers/time_limit.py

You can unwrap it if you do not want done=True by accessing env.env.

>>> import gym
>>> env = gym.make('Pendulum-v0')
>>> env
<TimeLimit<PendulumEnv<Pendulum-v0>>>
>>> env = env.env
>>> env
<gym.envs.classic_control.pendulum.PendulumEnv object at 0x1038974a8>

christopherhesse commented 5 years ago

Thanks @muupan, that's correct. If you use the underlying enviroment directly there will be no timelimit but the gym.make version has a timelimit.