uidilr / gail_ppo_tf

Tensorflow implementation of Generative Adversarial Imitation Learning(GAIL) with discrete action
MIT License
112 stars 29 forks source link

gaes = (gaes - gaes.mean()) / gaes.std() #22

Closed Joll123 closed 4 years ago

Joll123 commented 4 years ago

What does this formula mean in ppo? Thanks

uidilr commented 4 years ago

I can't remember clearly, but I think it is for stability. Similar question and answer is found in the link below. Hope it answers your question!

https://datascience.stackexchange.com/questions/20098/why-do-we-normalize-the-discounted-rewards-when-doing-policy-gradient-reinforcem