The formula and code of SAC are inconsistent

mimoralea / gdrl

Grokking Deep Reinforcement Learning

https://www.manning.com/books/grokking-deep-reinforcement-learning

BSD 3-Clause "New" or "Revised" License

812 stars 234 forks source link

The formula and code of SAC are inconsistent #23

Open MarginalCentrality opened 2 years ago

MarginalCentrality commented 2 years ago

In page 393, the objective for alpha shows the product relationship between alpha and the sum of target entropy heuristic and a likelihood term. However, the line " alpha_loss = -(self.policy_model.logalpha * target_alpha).mean() " is written in the corresponding code. They are inconsistent.