PPO2 implementation details?

hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

http://stable-baselines.readthedocs.io/

MIT License

4.09k stars 728 forks source link

PPO2 implementation details? #1140

Open FabioPINO opened 2 years ago

FabioPINO commented 2 years ago

Where can I find the implementation details that differentiate the PPO2 algorithm from the original version reported in Proximal Policy Optimization Algorithms by Schulman?

Miffyli commented 2 years ago

I do not think there is exhaustive document on this. For a closer match with Schulman's paper, check out the original baselines repository. I think there has been some small changes over the years to PPO2, but nothing major (e.g. fixing off-by-one mistakes and such).

araffin commented 2 years ago

I think Costa's blog is current the best to have all the implementation details that are in PPO: https://costa.sh/blog-the-32-implementation-details-of-ppo.html

But best is also to look at SB3 code now ;)

FabioPINO commented 2 years ago

Thank you for your prompt replies @Miffyli and @araffin! More specifically, what are the code-level optimizations used in PPO2. And in addition, how is the exploration carried out?