madvn / DDPG

Deep Deterministic Policy Gradients in TF r2.0
13 stars 2 forks source link

Problem about updating the actor network. #3

Closed Jian-Yin-Shine closed 4 years ago

Jian-Yin-Shine commented 4 years ago

I read your code to update the actor network, it different from the paper. In the paper it is : $-\frac{1}{batch_size}\sum_{i=1}^{batch_size}Q(s_i, a_i)$ Thank you very much for your answer.

madvn commented 4 years ago

Not sure what you are referring to. I have implemented the algorithm in page 5 here - https://arxiv.org/pdf/1509.02971.pdf