Open zichunxx opened 7 months ago
Usually in SAC we use Normal distribution coupled with tanh to bound action space. However, after such transformation the actual distribution is now not just standard Normal and we can not use it's lob_prob to get the probabilities of actions. This formula accounts for the transformation and gives right probabilities for TanhNormal distribution. See Appendix C in the original paper: https://arxiv.org/pdf/1801.01290.pdf
Thanks for your generous help @Howuhh. Is 1e-6
meant to limit the logarithmic value to approach negative infinity?
Problem Description
Hi! Thanks for this clean script to help me understand
sac
.But I have some questions about the implementation of
sac
's get action function, mainly focused on the following code snippethttps://github.com/vwxyzjn/cleanrl/blob/2d660b6d3053ea9037c746b4c9f3a6faa1f20c44/cleanrl/sac_continuous_action.py#L139-L141
What is the purpose of this? Thanks!
Checklist
poetry install
(see CleanRL's installation guideline.