Closed anotherkey closed 3 years ago
I've done some minor testing in the path with detaching variables (in Pytorch specifically, not so much in tensorflow) and found that, in general, it doesn't really affect learning. I tested by running the code multiple times, with combinations of variables detatched, and found virtually no difference that wasn't within the normal error bars of reinforcement learning.
In the majority of my code in YT, I don't bother with it, and the agents work anyway.
Thanks a lot!
Thanks for sharing your code! There is a problem that bothers me. For example, in pytorch SAC code, whether we need to use with torch.no_grad() or detach() when computing value_target and q_hat.
value_target = critic_value - log_probs # line 96
q_hat = self.scale*reward + self.gamma*value_ # line 116
I think they need to stop gradient computation.
value_target = value_target.detach()
q_hat = q_hat .detach()
The author of td3 algorithm uses with torch.no_grad() to compute target_Q. https://github.com/sfujim/TD3/blob/master/TD3.py 110
with torch.no_grad():
And the author of sac algorithm uses tf.stop_gradient() to compute value_target and q_hat. https://github.com/haarnoja/sac/blob/master/sac/algos/sac.py 256ys = tf.stop_gradient( self.scale_reward * self._rewards_ph + (1 - self._terminals_ph) * self._discount * vf_next_target_t ) # N
330 ` self._vf_tCould you give me some suggestions about this problem, please?