In page 393, the objective for alpha shows the product relationship between alpha and the sum of target entropy heuristic and a likelihood term. However, the line
" alpha_loss = -(self.policy_model.logalpha * target_alpha).mean() "
is written in the corresponding code. They are inconsistent.
In page 393, the objective for alpha shows the product relationship between alpha and the sum of target entropy heuristic and a likelihood term. However, the line " alpha_loss = -(self.policy_model.logalpha * target_alpha).mean() " is written in the corresponding code. They are inconsistent.