Closed TroddenSpade closed 2 years ago
I definitely meant it the way you describe it.
Thank you for the fix! Will merge this weekend.
BTW, sorry for the delay, I'm not sure how I missed this.
Adding https://github.com/mimoralea/gdrl/pull/22 because there are multiple notebooks with this issue--thanks for reporting this and the pull request!
Dear @mimoralea, In DDPG.train function, after defining networks of actors and critics, both online and target networks' parameters should be equalized. As the
self.update_networks(tau=1.0)
suggests, assigning 1.0 to tau inupdate_networks
should copy online parameters to target's. However, in the following function, a pre-definedself.tau
is used as the weight of the online parameters.Instead, defined tau in the first line of the function should be used as weights in the subsequent calculations.
These changes have been applied to DDPG class in chapter-12.ipynb