I implemented my own version of sac and the log probability of policy went above 0 sometimes when using the version given in paper.
According to what I read here (Pg6) , I think the squashing correction should be added not subtracted, since the determinant of Jacobian is multiplied when calculating pdf.
But then this incentivises the agent to just set actions to 1 to get low log pi
I am pretty sure I am missing something here. Can you please explain how did you arrive at the squashing correction given in the paper?
Hi
I implemented my own version of sac and the log probability of policy went above 0 sometimes when using the version given in paper.
According to what I read here (Pg6) , I think the squashing correction should be added not subtracted, since the determinant of Jacobian is multiplied when calculating pdf. But then this incentivises the agent to just set actions to 1 to get low log pi
I am pretty sure I am missing something here. Can you please explain how did you arrive at the squashing correction given in the paper?