ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.58k stars 829 forks source link

A2C has only a shared model #105

Closed ShaniGam closed 6 years ago

ShaniGam commented 6 years ago

Might be a more algorithmic question about A2C but still would like some help. In the A3C algorithm there is a separate model for each agent (and a shared one) and in A2C it seems like there is only a shared model why is that?

timmeinhardt commented 6 years ago

The A3C paper claimed that training a separate agent for each environment and then updating the shared weights introduces noise to the training which has a regularising effect. However, experiments showed that this is not necessarily true and therefore most of the Actor-Critic algorithms dropped the A for asynchronous and work with only one policy for all environments. A2C also allows for faster and better GPU optimisation of the updates.

ShaniGam commented 6 years ago

ok, now I get it. Thanks!