I know recording of Policy, RMS, and Test Traj is conducted in eval function .
I would like you to ask following questions.
Which code defines how often eval function is called?
When agent learned with PPO in the condition of total_timesteps=int(5 * 1e7), the iteration always finish at iter_02000, please tell me why this iteration number(2000) does not change
Does "iteration" mean how many episodes the agent learned? Or how many learning iteration agent learned?
The total number of iterations can be computed by dividing the total time steps by the total number of sampled time steps collected at each iteration. check here
I think the log interval is defined here So, I think the log interval is every 10 steps in this code, but I found the log interval of each episode is 50 when I run the code, So, could you tell me why this happens?
I also ask which code defines the "total number of sampled time steps"
Especially, I want to know how to calculate the final iteration number.
Thank you for interesting challenge!
I know recording of Policy, RMS, and Test Traj is conducted in
eval
function .I would like you to ask following questions.
eval
function is called?total_timesteps=int(5 * 1e7)
, the iteration always finish atiter_02000
, please tell me why this iteration number(2000) does not change