Do some of the variables require gradient?

anotherkey commented 4 years ago

Thanks for sharing your code! There is a problem that bothers me. For example, in pytorch SAC code, whether we need to use with torch.no_grad() or detach() when computing value_target and q_hat. value_target = critic_value - log_probs # line 96 q_hat = self.scale*reward + self.gamma*value_ # line 116

I think they need to stop gradient computation.

value_target = value_target.detach()

q_hat = q_hat .detach()

The author of td3 algorithm uses with torch.no_grad() to compute target_Q. https://github.com/sfujim/TD3/blob/master/TD3.py 110 with torch.no_grad(): And the author of sac algorithm uses tf.stop_gradient() to compute value_target and q_hat. https://github.com/haarnoja/sac/blob/master/sac/algos/sac.py 256 ys = tf.stop_gradient( self.scale_reward * self._rewards_ph + (1 - self._terminals_ph) * self._discount * vf_next_target_t ) # N 330 ` self._vf_t

tf.stop_gradient(min_log_target - log_pi + policy_prior_log_probs) )**2)`

Could you give me some suggestions about this problem, please?

philtabor commented 3 years ago

I've done some minor testing in the path with detaching variables (in Pytorch specifically, not so much in tensorflow) and found that, in general, it doesn't really affect learning. I tested by running the code multiple times, with combinations of variables detatched, and found virtually no difference that wasn't within the normal error bars of reinforcement learning.

In the majority of my code in YT, I don't bother with it, and the agents work anyway.

anotherkey commented 3 years ago

Thanks a lot!

philtabor / Youtube-Code-Repository

Do some of the variables require gradient? #23