sweetice / Deep-reinforcement-learning-with-pytorch

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
MIT License
3.88k stars 844 forks source link

Char 05 DDPG: step index and episode index #5

Closed xiangzz closed 5 years ago

xiangzz commented 5 years ago

for i in range(args.num_iteration): state = env.reset() for t in range(args.max_episode):

from the above code we can infer that i stands for the i-th step, and t stands for the t-th episode. However, it is shown in code: print('Episode {}, The memory size is {} '.format(i, len(agent.replay_buffer.storage))) that i is used for counting episode.

So do we need to change the positions of args.max_episode and args.num_iteration ?

sweetice commented 5 years ago

Thanks for your issue. Note that i stands for the number of episodes. And t stands for the length of one trajectory, i.e, the agent would stop a rollout when t>args.max_episode. However, this may mislead programmers, this bug has fixed in new version.