Closed GoingMyWay closed 6 years ago
For a weight w
shared between both value and policy network, the gradient update should be
w = w - grad(v, w)
followed by
w = w - grad(p, w)
which is same as
w = w - grad(v+p, w)
(by sum rule)
where p and v are policy and value losses respectively.
Hi, in a3c.py the gradients are updated by the total loss, however, in the original paper of A3C, there are two loss functions, one is policy loss function and another is value loss function, they both update the gradient including the weights of CNN.
I think the code should like this
Can you share your opinions on this?