Closed lorenzosteccanella closed 2 years ago
Hi,
I guess the entropy in A2C is wrong:
if new_action_entropy is not None: act_policy_loss += self.entropy_weight * new_action_entropy.mean()
instead it should be:
if new_action_entropy is not None: act_policy_loss -= self.entropy_weight * new_action_entropy.mean()
Best,
Lorenzo
Hi, entropy weight is negative here https://github.com/iffiX/machin/blob/7fa986b1bafdefff117d6ff73d14644a5488de9d/machin/frame/algorithms/a2c.py#L142
Ok didn't read that!
Thanks!
Hi,
I guess the entropy in A2C is wrong:
instead it should be:
Best,
Lorenzo