Open FabioPINO opened 2 years ago
I do not think there is exhaustive document on this. For a closer match with Schulman's paper, check out the original baselines repository. I think there has been some small changes over the years to PPO2, but nothing major (e.g. fixing off-by-one mistakes and such).
I think Costa's blog is current the best to have all the implementation details that are in PPO: https://costa.sh/blog-the-32-implementation-details-of-ppo.html
But best is also to look at SB3 code now ;)
Thank you for your prompt replies @Miffyli and @araffin! More specifically, what are the code-level optimizations used in PPO2. And in addition, how is the exploration carried out?
Where can I find the implementation details that differentiate the PPO2 algorithm from the original version reported in Proximal Policy Optimization Algorithms by Schulman?