openai / spinningup

An educational resource to help anyone learn deep reinforcement learning.
https://spinningup.openai.com/
MIT License
9.96k stars 2.19k forks source link

Possible error in the calculation of the log probabilities in SAC #314

Open PaoloAE opened 3 years ago

PaoloAE commented 3 years ago

https://github.com/openai/spinningup/blob/038665d62d569055401d91856abb287263096178/spinup/algos/pytorch/sac/core.py#L53-L67

In line 64 of the code above, the squashing function (tanh) is applied to pi_action, producing an action pi_action which lives in the interval [-1,1]. In line 60, logp_pi is set to the corresponding log probability of such action. However, in line 65 pi_action is multiplied by self.act_limit, which is a scaling factor. Now pi_action lives in the interval [-self.act_limit,self.act_limit]. According to the change of variables formula, also the corresponding logp_pi should change. Assuming that self.act_limit > 0, I think that the correct result would be obtained by adding, afterline 60, the following: logp_pi -= np.log(self.act_limit)

vermouth1992 commented 3 years ago

I believe most environments tested have act_limit=1, so it would be the same. If act_limit is not one, then you are correct.