Open LiuShangYuan opened 5 years ago
hello,i am confused about the calculation of R in PPO. In file PPO_CartPole_v0.py you calc R in function update, but I think the reward in the buffer maybe come from two diffent trajectory.
hello,i am confused about the calculation of R in PPO. In file PPO_CartPole_v0.py you calc R in function update, but I think the reward in the buffer maybe come from two diffent trajectory.