Closed mkanakis closed 4 years ago
Hi Marios,
Thanks for pointing this detail out! You are right, our implementation is essentially A2C. As mentioned by OpenAI, A2C sometimes outperforms A3C (which makes sense, because A3C can introduce more variance if the policy gradient doesn't contain off-policy corrections like in IMPALA). We didn't need the extra speed boost by going fully asynchronous, so we stayed with A2C. Hope this helps!
Best, Hongzi
Thanks a lot for the clarification!
Closing.
Best, Marios.
Hi,
I know this is more of a technicality, but I would like to clarify this.
A3C stands for Asynchronous Advantage Actor-Critic, whereas A2C can also be considered as a synchronous version OpenAI.
Isn't it the case that in your multi_agent implementation you are using synchronous network updates for all agents, taken from this snippet:
If it is, then you are actually updating the model for all agents at the same time, therefore we are talking about A2C rather than an A3C implementation?
Is my though process correct?
Kind regards, Marios.