PPO returns and advantages should be fixed

michaelnny / deep_rl_zoo

A collection of Deep Reinforcement Learning algorithms implemented with PyTorch to solve Atari games and classic control tasks like CartPole, LunarLander, and MountainCar.

Apache License 2.0

99 stars 8 forks source link

PPO returns and advantages should be fixed #5

Closed michaelnny closed 1 year ago

michaelnny commented 1 year ago

According to the original PPO paper, the estimated returns and advantages are pre-computed for the unroll sequences and fixed across the K epoch updates.

However this implementation computes the returns and advantages on-the-fly during parameters update, maybe that's not correct and we need to fix it.

michaelnny commented 1 year ago

fixed