A2C entropy minimized instead of maximized

iffiX / machin

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

MIT License

400 stars 51 forks source link

Closed lorenzosteccanella closed 2 years ago

lorenzosteccanella commented 2 years ago

Hi,

I guess the entropy in A2C is wrong:

if new_action_entropy is not None:
    act_policy_loss += self.entropy_weight * new_action_entropy.mean()

instead it should be:

if new_action_entropy is not None:
    act_policy_loss -= self.entropy_weight * new_action_entropy.mean()

Best,

Lorenzo

iffiX commented 2 years ago

lorenzosteccanella commented 2 years ago

Ok didn't read that!

Thanks!