yhyu13 / C51-DDPG

This is a TensorFlow implementation of DeepMind's A Distributional Perspective on Reinforcement Learning.(C51-DDPG)
10 stars 1 forks source link

“q_gradient_batch” in ddpg.py #1

Closed zcchenvy closed 5 years ago

zcchenvy commented 5 years ago

line 103: Why change the direction of the gradient?I think this step is not required. I do not really understand your meaning.

yhyu13 commented 5 years ago

According to the paper of ddpg on page 5, the actor needs to be updated by just plain q_gradient_batch. It should be added to the actor parameters.

Bust since the actor train function (see here) from TensorFlow, by default, applying a gradient descent, meaning any gradient given to the optimizer is subtracted. That's why I flipped the gradient so that the end effect is still an adding.

zcchenvy commented 5 years ago

According to the paper of ddpg on page 5, the actor needs to be updated by just plain q_gradient_batch. It should be added to the actor parameters.

Bust since the actor train function (see here) from TensorFlow, by default, applying a gradient descent, meaning any gradient given to the optimizer is subtracted. That's why I flipped the gradient so that the end effect is still an adding.

thank you