Closed manjavacas closed 1 year ago
Hi thanks for raising this issue. This sounds like a good idea, especially since we are already doing polyak updates in https://github.com/vwxyzjn/cleanrl/blob/3f5535cab409a34e9f071c10b96a234925d8a8d5/cleanrl/dqn_jax.py#L231 (optax docs on optax.incremental_update
).
Feel free to make a PR.
Problem Description
Checklist
poetry install
(see CleanRL's installation guideline.Current Behavior
Currently, DQN implementation do a hard update of the target network. However, it is possible to perform soft updates by using a soft update coefficient, between 0 and 1 (Polyak update).
Expected Behavior
Soft updates can increase the stability of learning, as detailed in the original DDPG paper. This is because the target values are constrained to change slowly.
Although this idea came after the original implementation of DQN, it is equally applicable to this algorithm.
Finally, this is a solution implemented in other reference libraries such as StableBaselines3, although I would understand that it is not intended to be added for simplicity and adherence to the original DQN implementation.
Possible Solution
In the current DQN implementation, substitute:
by: