So while checking the SAC_Discrete code I noticed the lack of calculate_entropy_tuning_losses function, which it inherit from SAC.
But according the SAC_Discrete paper equation 11 vs 9 (latter is for continuous SAC), for the discrete case, the Estimate E is rather taken by weighting the -alpha*(log_pi + target_entropy) with the probability of each action by the agent. ( pi), and not by sampling one log_pi.
Shouldn't SAC_Discrete have it's own entropy loss function then?
So while checking the SAC_Discrete code I noticed the lack of
calculate_entropy_tuning_losses
function, which it inherit from SAC.But according the SAC_Discrete paper equation 11 vs 9 (latter is for continuous SAC), for the discrete case, the Estimate E is rather taken by weighting the
-alpha*(log_pi + target_entropy)
with the probability of each action by the agent. (pi
), and not by sampling onelog_pi
.Shouldn't
SAC_Discrete
have it's own entropy loss function then?