pranz24 / pytorch-soft-actor-critic

PyTorch implementation of soft actor critic
MIT License
823 stars 182 forks source link

the bound enforce for log_prob in line 103 of model.py #44

Open Roboticyang opened 11 months ago

Roboticyang commented 11 months ago

I do not mathematically agree with the bound enforcement for log_prob offset in your Gauss_policy. For pdf's of x and y, in the multivariate cases, the offset would be the logarithm of a determinant of the Jacobian matrix ( y = tanh(x) ) based on the tanh function. The Jacobian happens to be a diagonal matrix, so the offset should be the logarithm of the product of the diagonal elements of the Jacobian matrix. Please let me know if my understanding of pdf's transformation with element-wise change of vector variables is correct or wrong.

Look forward to hearing from you.

Cheers,

Old Yang