I just read through your DDPG implementation, and it looks awesome. Thanks for sharing!
Currently, I feel confusion about the below code
self.action_gradients = [self.act_grad_v[0]/tf.to_float(tf.shape(self.act_grad_v[0])[0])]
in critic_net_bn.py.
Why do we add [0] after self.act_grad_v since we use a batch of actions to compute gradients?
What does "[0]" use for?
Hi,
I just read through your DDPG implementation, and it looks awesome. Thanks for sharing!
Currently, I feel confusion about the below code
self.action_gradients = [self.act_grad_v[0]/tf.to_float(tf.shape(self.act_grad_v[0])[0])]
in critic_net_bn.py.Why do we add [0] after self.act_grad_v since we use a batch of actions to compute gradients? What does "[0]" use for?
Thank you so much!