sweetice / Deep-reinforcement-learning-with-pytorch

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
MIT License
3.75k stars 837 forks source link

Temperature factor missing in SAC !!! #36

Open Darkness-hy opened 2 years ago

Darkness-hy commented 2 years ago

log_prob should be multiplied by temperature factor (alpha) when calculating pi_loss in ALL implementations of SAC.

Darkness-hy commented 2 years ago

Also, the output of "log_std_head" layer in Actor network in SAC is no need to go through ReLu, because what we need is the LOG of std instead of std value.