vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
4.84k stars 560 forks source link

get action in sac_continuous_action.py #428

Open zichunxx opened 7 months ago

zichunxx commented 7 months ago

Problem Description

Hi! Thanks for this clean script to help me understand sac.

But I have some questions about the implementation of sac's get action function, mainly focused on the following code snippet

https://github.com/vwxyzjn/cleanrl/blob/2d660b6d3053ea9037c746b4c9f3a6faa1f20c44/cleanrl/sac_continuous_action.py#L139-L141

What is the purpose of this? Thanks!

Checklist

Howuhh commented 7 months ago

Usually in SAC we use Normal distribution coupled with tanh to bound action space. However, after such transformation the actual distribution is now not just standard Normal and we can not use it's lob_prob to get the probabilities of actions. This formula accounts for the transformation and gives right probabilities for TanhNormal distribution. See Appendix C in the original paper: https://arxiv.org/pdf/1801.01290.pdf

zichunxx commented 7 months ago

Thanks for your generous help @Howuhh. Is 1e-6 meant to limit the logarithmic value to approach negative infinity?