Closed sarvghotra closed 8 years ago
Hey, the multiplication to compute the gradients is done automatically by the automatic differentiation built into TensorFlow. You have probably already have figured out that ddpg.py
is the relevant file. The line where the gradient with respect to the policy parameters is computed is
grads_and_vars_p = optim_p.compute_gradients(loss_p, var_list=self.theta_p)
where loss_p
is -Q
. And in the next line the actor parameters are updated via
optimize_p = optim_p.apply_gradients(grads_and_vars_p)
Does this sound reasonable?
Btw there is a paragraph about automatic differentiation in my bachelor thesis on page 5.
Thanks.
Could you pinpoint the code where actor's parameters (weights) are being updated ?
I am particularly looking for the step where gradient of critic is calculated wrt to action variables and actor wrt to theta. Sum of the multiplication of these gradients is used to update the actor (like given in the algorithm of the paper).