ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.53k stars 832 forks source link

fix value loss coefficient #287

Open dmitrySorokin opened 2 years ago

dmitrySorokin commented 2 years ago

value loss coefficient (0.5) seems to be added twice at line 79 and lines 74 / 76