openai / universe-starter-agent

A starter agent that can solve a number of universe environments.
MIT License
1.1k stars 318 forks source link

loss function issue #116

Closed GoingMyWay closed 6 years ago

GoingMyWay commented 7 years ago

Hi, in a3c.py the gradients are updated by the total loss, however, in the original paper of A3C, there are two loss functions, one is policy loss function and another is value loss function, they both update the gradient including the weights of CNN.

image

I think the code should like this

pi_grads = tf.gradients(pi_loss, pi.var_list)
vf_grads = tf.gradients(vf_loss, vf.var_list)

Can you share your opinions on this?

sanjeevk42 commented 7 years ago

For a weight w shared between both value and policy network, the gradient update should be w = w - grad(v, w) followed by w = w - grad(p, w) which is same as w = w - grad(v+p, w)(by sum rule)

where p and v are policy and value losses respectively.