hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.16k stars 725 forks source link

Fixed bug in log probability calculation for Diagonal Gaussian distribution #1060

Closed SVJayanthi closed 3 years ago

SVJayanthi commented 3 years ago

Description

The calculation of log probability in https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/base_class.py#L880-L884 is incorrect since the logstd is mistook as logvariance and std is mistook as variance. Three changes are suggested, the first being the rollout of action_prob returns the standard deviation and from that term the log of the standard deviation is calculated. Additionally, since the logstd actually represents the log of the standard deviation, the 0.5 is removed such that it is exactly the calculation in https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/distributions.py#L403-L405 which is mathematically correct. The third change is that the square of the standard deviation is taken to get the variance in the calculation of log probability.

Motivation and Context

This change is required because the calculation of log probability is inaccurate since the standard deviation is mistaken to be the variance. Therefore, the log probability of a Gaussian is inaccurately calculated and hence the model returns a false value.

Closes #1059 Closes #1058

Types of changes

Checklist:

sunshineclt commented 3 years ago

Great job @SVJayanthi! Could @araffin help review? Thanks!

araffin commented 3 years ago

Hello, thanks for the PR =) I hope to have more time end of this week or next week to do the review ;)