In line 64 of the code above, the squashing function (tanh) is applied to pi_action, producing an action pi_action which lives in the interval [-1,1]. In line 60, logp_pi is set to the corresponding log probability of such action.
However, in line 65 pi_action is multiplied by self.act_limit, which is a scaling factor. Now pi_action lives in the interval [-self.act_limit,self.act_limit]. According to the change of variables formula, also the corresponding logp_pi should change. Assuming that self.act_limit > 0, I think that the correct result would be obtained by adding, afterline 60, the following:
logp_pi -= np.log(self.act_limit)
https://github.com/openai/spinningup/blob/038665d62d569055401d91856abb287263096178/spinup/algos/pytorch/sac/core.py#L53-L67
In line 64 of the code above, the squashing function (tanh) is applied to
pi_action
, producing an actionpi_action
which lives in the interval[-1,1]
. In line 60,logp_pi
is set to the corresponding log probability of such action. However, in line 65pi_action
is multiplied byself.act_limit
, which is a scaling factor. Nowpi_action
lives in the interval[-self.act_limit,self.act_limit]
. According to the change of variables formula, also the correspondinglogp_pi
should change. Assuming thatself.act_limit > 0
, I think that the correct result would be obtained by adding, afterline 60, the following:logp_pi -= np.log(self.act_limit)