Closed YilunZhou closed 4 years ago
No, timestep
is not used to prevent the episode running too long, max_timesteps
ensures that. timestep
is used to count the number of timesteps from the last update, since we perform update after a fixed number of timesteps (update_timestep
). Note that, different episodes can be of different timesteps but the update is performed after a fixed number of timesteps
The
timestep
should not be reset on line 182 in PPO.py, because it is used to prevent the episode running too long. Currently it does not affect the performance as the episode length will not exceedmax_timesteps
, but the logic is wrong. And the same for the continuous case as well.