Fixed bug in log probability calculation for Diagonal Gaussian distribution

SVJayanthi commented 3 years ago

Description

The calculation of log probability in https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/base_class.py#L880-L884 is incorrect since the logstd is mistook as logvariance and std is mistook as variance. Three changes are suggested, the first being the rollout of action_prob returns the standard deviation and from that term the log of the standard deviation is calculated. Additionally, since the logstd actually represents the log of the standard deviation, the 0.5 is removed such that it is exactly the calculation in https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/distributions.py#L403-L405 which is mathematically correct. The third change is that the square of the standard deviation is taken to get the variance in the calculation of log probability.

Motivation and Context

This change is required because the calculation of log probability is inaccurate since the standard deviation is mistaken to be the variance. Therefore, the log probability of a Gaussian is inaccurately calculated and hence the model returns a false value.

Closes #1059 Closes #1058

[x] I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Documentation (update in the documentation)

Checklist:

[x] I've read the CONTRIBUTION guide (required)
[x] I have updated the changelog accordingly (required).
[ ] My change requires a change to the documentation.
[x] I have updated the tests accordingly (required for a bug fix or a new feature).
[ ] I have updated the documentation accordingly.
[ ] I have ensured pytest and pytype both pass (by running make pytest and make type).

sunshineclt commented 3 years ago

Great job @SVJayanthi! Could @araffin help review? Thanks!

araffin commented 3 years ago

Hello, thanks for the PR =) I hope to have more time end of this week or next week to do the review ;)

hill-a / stable-baselines