ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.56k stars 831 forks source link

potential bugs in kfac.py #54

Open gd-zhang opened 6 years ago

gd-zhang commented 6 years ago

In compute_cova, `a = a.view(-1, a.size(-1)).div(a.size(1)).div_(a.size(2)) should be a = a.view(-1, a.size(-1))`

In compute_covg, `g = g.view(-1, g.size(-1)).mul(g.size(1)).mul(g.size(2)) should be g = g.view(-1, g.size(-1)) and g = g * batch_size` should be deleted. @ikostrikov Why do you multiply g by batch_size?

ikostrikov commented 6 years ago

Hi,

cov_a and cov_g are implemented this way in order to be consistent the original KFAC code from OpenAI.

It true that it can be implemented in a different way.

I multiply by batch_size because these gradients were averaged in the loss function. See: https://github.com/openai/baselines/blob/master/baselines/acktr/kfac.py#L417