sweetice / Deep-reinforcement-learning-with-pytorch

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
MIT License
3.88k stars 844 forks source link

about the advantage values in PPO2 #30

Open Hardlygo opened 3 years ago

Hardlygo commented 3 years ago

I think that the advantage value here should be base on the old actor target_v = reward + args.gamma * self.critic_net(next_state)