strawlab / best

Bayesian estimation supersedes the t test
MIT License
145 stars 37 forks source link

Sigma ≠ standard deviation ⇒ introduce Model v2 #8

Open treszkai opened 5 years ago

treszkai commented 5 years ago

This PR fixes a bug in the model specification, which is present in the original publication and the R package as well.

The standard deviation of a t-distribution is not σ. Instead, variance = nu / (nu - 2) * sigma ** 2 if nu > 2, and variance = ∞ if 1 < nu <= 2. Therefore, any posterior sample nu <= 2 assigns a non-zero probability that the variance is ∞, which in real life is never the case.

Proof that SD is underestimated by the R package (Python output is essentially the same):

> require(BEST)
Loading required package: BEST
Loading required package: HDInterval
> N=1000
> testDataDrug = rt(N, df=4)  # sigma = 1
> sd(testDataDrug)
[1] 1.417539
> BESTout = BESTmcmc(testDataDrug)
Waiting for parallel processing to complete...done.
> mean(BESTout$sigma)
[1] 1.006396
> mean(sqrt(BESTout$nu / (BESTout$nu - 2)) * BESTout$sigma)
[1] 1.47081

Samples of nu close to 2 are also problematic, as they blow up the SD estimation. Therefore I changed the sampling of nu as such:

( t(df=2.5) is very similar to t(df=2) when it's not extremely far from the mean.)


The other change concerns the sampling of σ.


As a consequence, reduced the displayed limits of the std.devs in plot_posterior and plot_all to 99.5% of the samples.

(This is part 3 of 4 pull requests that build on top of one another.)