ikostrikov / pytorch-a3c

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".
MIT License
1.23k stars 280 forks source link

question about using GAE #34

Closed andrewliao11 closed 7 years ago

andrewliao11 commented 7 years ago

I found that in original GAE paper eq.16 A{t}^{GAE} = \sum{l=0 }^{\infty} (\gamma \tau )^l \delta_{t+l}^{V}

However, in the code the advantage is look like https://github.com/ikostrikov/pytorch-a3c/blob/master/train.py#L97

gae = gae * args.gamma * args.tau + delta_t

Shouldn't it modified into:

gae += args.gamma * args.tau * delta_t

I haven't implemented code with GAE before, so I'm just curious about this

ikostrikov commented 7 years ago

Hi, I'm sorry for a late reply. In the original equation it's computed in a forward pass. While here it's computed in a backward pass.