rmst / ddpg

TensorFlow implementation of the DDPG algorithm from the paper Continuous Control with Deep Reinforcement Learning (ICLR 2016)
MIT License
209 stars 64 forks source link

Need your help to understand a step #2

Closed sarvghotra closed 8 years ago

sarvghotra commented 8 years ago

Could you pinpoint the code where actor's parameters (weights) are being updated ?

I am particularly looking for the step where gradient of critic is calculated wrt to action variables and actor wrt to theta. Sum of the multiplication of these gradients is used to update the actor (like given in the algorithm of the paper).

rmst commented 8 years ago

Hey, the multiplication to compute the gradients is done automatically by the automatic differentiation built into TensorFlow. You have probably already have figured out that ddpg.py is the relevant file. The line where the gradient with respect to the policy parameters is computed is grads_and_vars_p = optim_p.compute_gradients(loss_p, var_list=self.theta_p) where loss_p is -Q. And in the next line the actor parameters are updated via optimize_p = optim_p.apply_gradients(grads_and_vars_p)

Does this sound reasonable?

Btw there is a paragraph about automatic differentiation in my bachelor thesis on page 5.

sarvghotra commented 8 years ago

Thanks.