p-christ / Deep-Reinforcement-Learning-Algorithms-with-PyTorch

PyTorch implementations of deep reinforcement learning algorithms and environments
MIT License
5.64k stars 1.2k forks source link

Calculate Entropy Tuning Loss in SAC/SAC Discrete #65

Closed xingdi-eric-yuan closed 3 years ago

xingdi-eric-yuan commented 3 years ago

Hi all,

I might have misunderstood, but shouldn't one use self.alpha rather than self.log_alpha HERE?

Thanks.

Howuhh commented 3 years ago

Same question! Moreover, in the original SAC implementation, the authors use alpha not log alpha

with tf.GradientTape() as tape:
    alpha_losses = -1.0 * (
        self._alpha * tf.stop_gradient(log_pis + self._target_entropy))
    # NOTE(hartikainen): It's important that we take the average here,
    # otherwise we end up effectively having `batch_size` times too
    # large learning rate.
    alpha_loss = tf.nn.compute_average_loss(alpha_losses)
Howuhh commented 3 years ago

@xingdi-eric-yuan I think I found an explanation from the author of the other implementation: https://github.com/ku2482/sac-discrete.pytorch/issues/3#issuecomment-561274577

xingdi-eric-yuan commented 3 years ago

Thanks, @Howuhh ! I recently tried comparing between using alpha vs log of alpha in that loss (in a discrete SAC setting), I can confirm that there's nothing noticeable between the agents' performance. Although I don't observe a clear advantage using log alpha (in terms of performance), at least it does not hurt.