nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.63k stars 340 forks source link

resetting timestep wrong? #11

Closed YilunZhou closed 4 years ago

YilunZhou commented 4 years ago

The timestep should not be reset on line 182 in PPO.py, because it is used to prevent the episode running too long. Currently it does not affect the performance as the episode length will not exceed max_timesteps, but the logic is wrong. And the same for the continuous case as well.

nikhilbarhate99 commented 4 years ago

No, timestep is not used to prevent the episode running too long, max_timesteps ensures that. timestep is used to count the number of timesteps from the last update, since we perform update after a fixed number of timesteps (update_timestep). Note that, different episodes can be of different timesteps but the update is performed after a fixed number of timesteps