This change is required because the calculation of log probability is inaccurate since the standard deviation is mistaken to be the variance. Therefore, the log probability of a Gaussian is inaccurately calculated and hence the model returns a false value.
Closes #1059
Closes #1058
[x] I have raised an issue to propose this change (required for new features and bug fixes)
Types of changes
[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
Description
The calculation of log probability in https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/base_class.py#L880-L884 is incorrect since the logstd is mistook as logvariance and std is mistook as variance. Three changes are suggested, the first being the rollout of action_prob returns the standard deviation and from that term the log of the standard deviation is calculated. Additionally, since the logstd actually represents the log of the standard deviation, the 0.5 is removed such that it is exactly the calculation in https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/distributions.py#L403-L405 which is mathematically correct. The third change is that the square of the standard deviation is taken to get the variance in the calculation of log probability.
Motivation and Context
This change is required because the calculation of log probability is inaccurate since the standard deviation is mistaken to be the variance. Therefore, the log probability of a Gaussian is inaccurately calculated and hence the model returns a false value.
Closes #1059 Closes #1058
Types of changes
Checklist:
pytest
andpytype
both pass (by runningmake pytest
andmake type
).