Closed xingdi-eric-yuan closed 3 years ago
Same question! Moreover, in the original SAC implementation, the authors use alpha not log alpha
with tf.GradientTape() as tape:
alpha_losses = -1.0 * (
self._alpha * tf.stop_gradient(log_pis + self._target_entropy))
# NOTE(hartikainen): It's important that we take the average here,
# otherwise we end up effectively having `batch_size` times too
# large learning rate.
alpha_loss = tf.nn.compute_average_loss(alpha_losses)
@xingdi-eric-yuan I think I found an explanation from the author of the other implementation: https://github.com/ku2482/sac-discrete.pytorch/issues/3#issuecomment-561274577
Thanks, @Howuhh ! I recently tried comparing between using alpha vs log of alpha in that loss (in a discrete SAC setting), I can confirm that there's nothing noticeable between the agents' performance. Although I don't observe a clear advantage using log alpha (in terms of performance), at least it does not hurt.
Hi all,
I might have misunderstood, but shouldn't one use
self.alpha
rather thanself.log_alpha
HERE?Thanks.