iffiX / machin

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...
MIT License
400 stars 51 forks source link

A2C entropy minimized instead of maximized #27

Closed lorenzosteccanella closed 2 years ago

lorenzosteccanella commented 2 years ago

Hi,

I guess the entropy in A2C is wrong:

if new_action_entropy is not None:
    act_policy_loss += self.entropy_weight * new_action_entropy.mean()

instead it should be:

if new_action_entropy is not None:
    act_policy_loss -= self.entropy_weight * new_action_entropy.mean()

Best,

Lorenzo

iffiX commented 2 years ago

Hi, entropy weight is negative here https://github.com/iffiX/machin/blob/7fa986b1bafdefff117d6ff73d14644a5488de9d/machin/frame/algorithms/a2c.py#L142

lorenzosteccanella commented 2 years ago

Ok didn't read that!

Thanks!