sweetice / Deep-reinforcement-learning-with-pytorch

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
MIT License
3.88k stars 844 forks source link

confused about the calculation of R in PPO #4

Open LiuShangYuan opened 5 years ago

LiuShangYuan commented 5 years ago

hello,i am confused about the calculation of R in PPO. In file PPO_CartPole_v0.py you calc R in function update, but I think the reward in the buffer maybe come from two diffent trajectory.