openai / spinningup

An educational resource to help anyone learn deep reinforcement learning.
https://spinningup.openai.com/
MIT License
9.91k stars 2.19k forks source link

Pytorch SAC alpha sign #304

Open jose-alatorre-harvard opened 3 years ago

jose-alatorre-harvard commented 3 years ago

Hi,

In line 241 of sac.py loss_pi = (alpha * logp_pi - q_pi).mean()

shouldnt be ? loss_pi = (q_pi-alpha * logp_pi ).mean()

I believe the signs are wrong.

Harimus commented 3 years ago

optimizing (taking learning steps) in pytorch are done using gradient decent on the loss function, while in policy gradients methods, we use gradient ascent to improve the actor. Switching the sign here is to make the gradient decent have the gradients ascent effect. The compute_loss(obs, act, weight) in Spinningup Intro3 uses the same trick.