Closed ShaniGam closed 6 years ago
The A3C paper claimed that training a separate agent for each environment and then updating the shared weights introduces noise to the training which has a regularising effect. However, experiments showed that this is not necessarily true and therefore most of the Actor-Critic algorithms dropped the A for asynchronous and work with only one policy for all environments. A2C also allows for faster and better GPU optimisation of the updates.
ok, now I get it. Thanks!
Might be a more algorithmic question about A2C but still would like some help. In the A3C algorithm there is a separate model for each agent (and a shared one) and in A2C it seems like there is only a shared model why is that?