haarnoja / sac

Soft Actor-Critic
Other
997 stars 233 forks source link

Stupid issue #18

Closed ghost closed 6 years ago

ghost commented 6 years ago

Hi

I implemented my own version of sac and the log probability of policy went above 0 sometimes when using the version given in paper.

According to what I read here (Pg6) , I think the squashing correction should be added not subtracted, since the determinant of Jacobian is multiplied when calculating pdf. But then this incentivises the agent to just set actions to 1 to get low log pi

I am pretty sure I am missing something here. Can you please explain how did you arrive at the squashing correction given in the paper?

ghost commented 6 years ago

sorry i got confused with very basic things.