Closed xiangzz closed 5 years ago
Thanks for your issue. Note that i stands for the number of episodes. And t stands for the length of one trajectory, i.e, the agent would stop a rollout when t>args.max_episode. However, this may mislead programmers, this bug has fixed in new version.
for i in range(args.num_iteration): state = env.reset() for t in range(args.max_episode):
from the above code we can infer that i stands for the i-th step, and t stands for the t-th episode. However, it is shown in code:
print('Episode {}, The memory size is {} '.format(i, len(agent.replay_buffer.storage)))
that i is used for counting episode.So do we need to change the positions of
args.max_episode
andargs.num_iteration
?