Closed andrewliao11 closed 7 years ago
I found that in original GAE paper eq.16 A{t}^{GAE} = \sum{l=0 }^{\infty} (\gamma \tau )^l \delta_{t+l}^{V}
However, in the code the advantage is look like https://github.com/ikostrikov/pytorch-a3c/blob/master/train.py#L97
gae = gae * args.gamma * args.tau + delta_t
Shouldn't it modified into:
gae += args.gamma * args.tau * delta_t
I haven't implemented code with GAE before, so I'm just curious about this
Hi, I'm sorry for a late reply. In the original equation it's computed in a forward pass. While here it's computed in a backward pass.
I found that in original GAE paper eq.16 A{t}^{GAE} = \sum{l=0 }^{\infty} (\gamma \tau )^l \delta_{t+l}^{V}
However, in the code the advantage is look like https://github.com/ikostrikov/pytorch-a3c/blob/master/train.py#L97
Shouldn't it modified into:
I haven't implemented code with GAE before, so I'm just curious about this