Which code desines "itaration" in eval funcion in ppo.py when learning

uzh-rpg / agile_flight

Developing and Comparing Vision-based Algorithms for Vision-based Agile Flight

MIT License

142 stars 54 forks source link

Closed HarukiKozukapenguin closed 2 years ago

HarukiKozukapenguin commented 2 years ago

Thank you for interesting challenge!

I know recording of Policy, RMS, and Test Traj is conducted in eval function .

I would like you to ask following questions.

Which code defines how often eval function is called?
When agent learned with PPO in the condition of total_timesteps=int(5 * 1e7), the iteration always finish at iter_02000, please tell me why this iteration number(2000) does not change
Does "iteration" mean how many episodes the agent learned? Or how many learning iteration agent learned?

yun-long commented 2 years ago

The eval function is called here
The total number of iterations can be computed by dividing the total time steps by the total number of sampled time steps collected at each iteration. check here
see 2.

HarukiKozukapenguin commented 2 years ago

@yun-long Thank you answering questions.

I think the log interval is defined here So, I think the log interval is every 10 steps in this code, but I found the log interval of each episode is 50 when I run the code, So, could you tell me why this happens? I also ask which code defines the "total number of sampled time steps" Especially, I want to know how to calculate the final iteration number.