Stupid issue - Githubissues

I implemented my own version of sac and the log probability of policy went above 0 sometimes when using the version given in paper.

According to what I read here (Pg6) , I think the squashing correction should be added not subtracted, since the determinant of Jacobian is multiplied when calculating pdf. But then this incentivises the agent to just set actions to 1 to get low log pi

I am pretty sure I am missing something here. Can you please explain how did you arrive at the squashing correction given in the paper?

haarnoja / sac

Stupid issue #18