Closed Joll123 closed 4 years ago
What does this formula mean in ppo? Thanks
I can't remember clearly, but I think it is for stability. Similar question and answer is found in the link below. Hope it answers your question!
https://datascience.stackexchange.com/questions/20098/why-do-we-normalize-the-discounted-rewards-when-doing-policy-gradient-reinforcem
What does this formula mean in ppo? Thanks