A collection of Deep Reinforcement Learning algorithms implemented with PyTorch to solve Atari games and classic control tasks like CartPole, LunarLander, and MountainCar.
According to the original PPO paper, the estimated returns and advantages are pre-computed for the unroll sequences and fixed across the K epoch updates.
However this implementation computes the returns and advantages on-the-fly during parameters update, maybe that's not correct and we need to fix it.
According to the original PPO paper, the estimated returns and advantages are pre-computed for the unroll sequences and fixed across the K epoch updates.
However this implementation computes the returns and advantages on-the-fly during parameters update, maybe that's not correct and we need to fix it.