vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.02k stars 575 forks source link

Target network isn't updated to the correct frequency when `target_network_frequency % train_frequency != 0` #322

Closed qgallouedec closed 1 year ago

qgallouedec commented 1 year ago

Because this if statement

https://github.com/vwxyzjn/cleanrl/blob/c37a3ec4ef8d33ab7c8a69d4d2714e3817739365/cleanrl/dqn.py#L205

is inside this one

https://github.com/vwxyzjn/cleanrl/blob/c37a3ec4ef8d33ab7c8a69d4d2714e3817739365/cleanrl/dqn.py#L185

Consequently, the target network is updated when global_step % train_frequency == 0 and global_step % target_network_frequency == 0.

For example, when you run

python cleanrl/dqn.py --target-network-frequency 501

The target network is updated every 5010 timesteps, not every 501 timesteps.