richarddmorey / BayesFactor

BayesFactor R package for Bayesian data analysis with common statistical models.
https://richarddmorey.github.io/BayesFactor/
131 stars 48 forks source link

Wrong rho in correlationBF with an interval null? #127

Closed lindeloev closed 5 years ago

lindeloev commented 5 years ago

rho is remarkably lower when setting nullInterval in correlationBF. Here is a pretty strong correlation of around rho=0.95:

D = data.frame(a=1:100, b=1:100+rnorm(100, sd=10))

But it is estimated to be around 0.09 when setting a nullInterval:

summary(correlationBF(D$a, D$b, nullInterval=c(-0.1, 0.1), posterior=TRUE, iterations=10000))
1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

        Mean       SD  Naive SE Time-series SE
rho  0.09466 0.005288 5.288e-05      5.288e-05
zeta 0.09495 0.005330 5.330e-05      5.330e-05

2. Quantiles for each variable:

        2.5%     25%     50%     75%   97.5%
rho  0.08059 0.09269 0.09624 0.09842 0.09987
zeta 0.08076 0.09296 0.09654 0.09874 0.10020

Not so if it is a point-null:

summary(correlationBF(D$a, D$b, posterior=TRUE, iterations=10000))
1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

       Mean      SD  Naive SE Time-series SE
rho  0.9316 0.01349 0.0001349      0.0002996
zeta 1.6798 0.10139 0.0010139      0.0020866

2. Quantiles for each variable:

       2.5%    25%    50%    75%  97.5%
rho  0.9023 0.9234 0.9329 0.9414 0.9541
zeta 1.4847 1.6119 1.6806 1.7499 1.8753

The BFs seem to be consistent, though.

richarddmorey commented 5 years ago

This is the correct behaviour; what has happened is that by setting nullInterval, you've restricted the posterior to values between -.1 and .1. If you look at the histogram of the samples in the nullInterval case, you'll see they're right up against the upper bound that was set (.1). That's because IF the value is between -.1 and .1, then it certainly MUST be close to .1 (because the data are consistent with large values).

lindeloev commented 5 years ago

Thanks for the explanation, @richarddmorey! I see what you're saying, but maybe it would be helpful to clarify this behavior in the documentation or as a footnote to the summary output. I would expect the sampler to visit both the null and the alternative. What you're describing I would only expect if the prior was set to zero outside nullInterval. Clearly, my expectations were wrong, but maybe others would have the same misconception.

Anyway, thanks - it's a great package!

richarddmorey commented 5 years ago

Yes, you're right, it could be clearer. Specifically, it samples from the numerator model, which could be either the null or the alternative. I see this is only mentioned in help for the posterior() function, not for the posterior arguments.

I will note, however, that if you look at the samples object:

s = correlationBF(D$a, D$b, nullInterval=c(-0.1, 0.1), posterior=TRUE, iterations=10000)

the model is specified at the end of the output:

# > s
# [lines of output snipped]
---
 Model:
Type: BFcorrelation, Jeffreys-beta*
Alternative, r = 0.333333333333333, rho =/= 0 -0.1<rho<0.1 

But the summary() function strips that out, since it is not a BayesFactor function.