Possible issue with policy decay in TD3.

sweetice / Deep-reinforcement-learning-with-pytorch

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....

MIT License

3.75k stars 837 forks source link

Closed WillBrennan closed 5 years ago

WillBrennan commented 5 years ago

Hi there;

I was looking through the code, and I'm not sure if this is correct. You're checking if num_iterations % policy_decay is zero,

I believe this should instead be, if i % args.policy_delay == 0:

sweetice commented 5 years ago

Thanks for your issue. You are right. This bug has been fixed.