Advantages should be computed every ppo epoch?

vwxyzjn / ppo-implementation-details

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/

Other

637 stars 99 forks source link

Advantages should be computed every ppo epoch? #5

Open yanghoonkim opened 1 year ago

yanghoonkim commented 1 year ago

Thanks for the great implementation. I found (in ppo_continuous) that the advantage is computed only once right after rollout, shouldn't it be inside the ppo epoch?