Closed yangysc closed 3 years ago
Yeah, this seems to be a bug. We probably never noticed this as we only use this distribution in SAC right now, where we don't need the entropy
or kl
methods. Will fix ...
Thanks for filing this @yangysc !
The problem with SquashedGaussian is that there is no analytical entropy term to evaluate. Instead, one would have to sample n times from the distribution and then take the mean neg log likelihood as an entropy estimate. We should probably raise an error when entropy or kl are called on this distribution.
PR: https://github.com/ray-project/ray/pull/13126
Btw, if you would like to use a bounded distribution for e.g. PPO, you can use the Beta distributions instead, which do have kl and entropy methods.
Yeah, this seems to be a bug. We probably never noticed this as we only use this distribution in SAC right now, where we don't need the
entropy
orkl
methods. Will fix ...
Thanks, @sven1977 ~ I was trying using recurrent PPO for continuous action space, since SAC does not have the recurrent policy. So I tried using squashedgaussian instead of diaggaussion, hoping for better performance. Thanks for your advice, I'll try the Beta distribution.
What is the problem?
In current latest code of TorchSquashedGaussian distribution ( and tf version) : should the entropy and kl function return scalar value per sample? For now, the current implementation seems to omit to sum in the last dimension
Reproduction (REQUIRED)
Here are the codes for TorchSquashedGaussian and DiagGaussian. It seems we need to override the entropy and kl functions in TorchSquashedGaussian, like wha we did in DiagGaussian .
If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script".