stevenpjg / ddpg-aigym

Continuous control with deep reinforcement learning - Deep Deterministic Policy Gradient (DDPG) algorithm implemented in OpenAI Gym environments
MIT License
275 stars 74 forks source link

A question on action_gradients in critic_net_bn.py #12

Closed pxlong closed 6 years ago

pxlong commented 7 years ago

Hi,

I just read through your DDPG implementation, and it looks awesome. Thanks for sharing!

Currently, I feel confusion about the below code self.action_gradients = [self.act_grad_v[0]/tf.to_float(tf.shape(self.act_grad_v[0])[0])] in critic_net_bn.py.

Why do we add [0] after self.act_grad_v since we use a batch of actions to compute gradients? What does "[0]" use for?

Thank you so much!

stevenpjg commented 6 years ago

self.act_grad_v[0] retrieves elements in the list.