ikostrikov / pytorch-a3c

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".
MIT License
1.23k stars 279 forks source link

SharedAdam bias correction wrong #12

Closed pfrendl closed 7 years ago

pfrendl commented 7 years ago

The timestep parameters of the SharedAdam optimizer are not shared. This should lead to bias overcorrection, leading to incorrect unbiased estimates. Does the current implementation work?

ikostrikov commented 7 years ago

Yes, that's true. Thanks! I will fix it in the next couple of days.

ikostrikov commented 7 years ago

Fixed in https://github.com/ikostrikov/pytorch-a3c/commit/5d9b07d80740e26f78cb283f74b5b802906a9d83