sweetice / Deep-reinforcement-learning-with-pytorch

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
MIT License
3.75k stars 837 forks source link

Possible issue with policy decay in TD3. #8

Closed WillBrennan closed 5 years ago

WillBrennan commented 5 years ago

Hi there;

I was looking through the code, and I'm not sure if this is correct. You're checking if num_iterations % policy_decay is zero,

https://github.com/sweetice/Deep-reinforcement-learning-with-pytorch/blob/a4f458dde7659654fcae4635d25f6bd05a5d2d6c/Char10%20TD3/TD3_BipedalWalker-v2.py#L212

I believe this should instead be, if i % args.policy_delay == 0:

sweetice commented 5 years ago

Thanks for your issue. You are right. This bug has been fixed.