richarddmorey / BayesFactor

BayesFactor R package for Bayesian data analysis with common statistical models.
https://richarddmorey.github.io/BayesFactor/
132 stars 49 forks source link

Null interval t>5 approximation gives large discontinuous change in BF from t<=5 to t>5 #167

Closed jhmaindonald closed 1 year ago

jhmaindonald commented 1 year ago

BayesFactor Null Interval Large t Approximation.pdf Try the following

library(latticeExtra)
library(BayesFactor)
t2bfInterval <- function(t, n=10, rscale="medium", mu=c(-.1,.1)){
     null0 <- BayesFactor::ttest.tstat(t=t, n1=n, nullInterval=mu,
                                       rscale=rscale,simple=TRUE)
alt0 <- BayesFactor::ttest.tstat(t=t, n1=n, nullInterval=mu,
                                 rscale=rscale, complement=TRUE, simple=TRUE)
c(null0, alt0, alt0/null0)
}
tval <- seq(from=4.98, to=5.02, by=0.01)
nval <- 3:8
bfDF <- expand.grid(t=tval, n=nval)
ncol <- 2; tcol <- 1
other <- apply(bfDF,1,function(x)
    t2bfInterval(t=x[tcol], n=x[ncol], mu=c(-0.1,0.1))
  )
bfDF <- setNames(cbind(bfDF, t(other)),
    c('t','n','null0','alt0','bfInterval'))
bfDF
bfDF[27:30,]
      t n     null0     alt0 bfInterval
27 4.99 8 1.0703126 31.09391   29.05125
28 5.00 8 1.0703881 31.37953   29.31603
29 5.01 8 0.1629479 31.75659  194.88802
30 5.02 8 0.1623894 32.04688  197.34590
jhmaindonald commented 1 year ago

The following demonstrates the discontinuity when using ttestBF()

d <- c(1.6,2.75,1.68,1.65,0.37,1.36,2.16,1.16,4.97,1.74)
ttestBF(d-0.001, n1=10, nullInterval=c(-.1,.1))
# . . .
# [1] Alt., r=0.707 -0.1<d<0.1    : 1.107262 ±0%
# [2] Alt., r=0.707 !(-0.1<d<0.1) : 57.53165 ±0%
# . . .
ttestBF(d, n1=10, nullInterval=c(-.1,.1))
# t is large; approximation invoked.
# t is large; approximation invoked.
# . . .
# [1] Alt., r=0.707 -0.1<d<0.1    : 0.1948791 ±NA%
# [2] Alt., r=0.707 !(-0.1<d<0.1) : 57.79087  ±NA%
# . . .
jhmaindonald commented 1 year ago

I accidentally closed this.

richarddmorey commented 1 year ago

Thanks for the report, I'll check it out.

richarddmorey commented 1 year ago

This is due to the approximation being particularly terrible with low N. I can introduce a better way of deciding when to evoke the approximation, because it also might not be needed for small n with t statistics as small as 5.

jhmaindonald commented 1 year ago

I do think there should be agreement at the point of cross-over to the approximation. Even for n=300, the approximation is not great, for a null interval of (-0.01, 0.01) or (-0.1, 0.1):

            Null = (-0.01, 0.01)          Null = (-0.1, 0.1)
   t    n null.01    alt.01  bfInt.01    null.1     alt.1  bfInt.1
5.00    3   1.000     3.039     3.039     1.009     3.218    3.190
5.01    3   0.083     3.054    36.987     0.083     3.317   39.823

5.00   30   1.006   909.768   903.934     1.738   989.973  569.451
5.01   30   0.387   933.132  2409.182     0.555  1015.512 1830.829

5.00  300   1.114  9565.781  8588.488    82.147 10402.732  126.635
5.01  300   0.944 10013.835 10603.656    64.850 10892.063  167.957

5.00 3000   2.582  5228.233  2024.726 39701.764  1790.458    0.045
5.01 3000   2.533  5494.020  2168.572 41383.758  1914.511    0.046

It would be straightforward to apply an ad hoc adjustment that would make no change asymptotically. This would make best sense if somehow related to perhaps the third and fourth moments of relevant distributions, or to other such statistics.

I notice that the BayesTools package, which has its own implementation of the Savage-Dickey approximation, includes in its ComparisonR vignette a comparison with use of the BayesFactor package. It is not obvious to me how to do an equivalent comparison for the type of null interval comparison I've done, but I do wonder whether there may be useful ideas there.

Code for the main part of the above output is:

library(BayesFactor)
t2bfInterval <- function(t, n=10, rscale="medium", mu=c(-.1,.1)){
     null0 <- BayesFactor::ttest.tstat(t=t, n1=n, nullInterval=mu,
                                       rscale=rscale,simple=TRUE)
alt0 <- BayesFactor::ttest.tstat(t=t, n1=n, nullInterval=mu,
                                 rscale=rscale, complement=TRUE, simple=TRUE)
c(null0, alt0, alt0/null0)
}
##
tval <- c(5,5.01)
nval <- c(3,30,300,3000)
bfDF <- expand.grid(t=tval, n=nval)
ncol <- 2; tcol <- 1
other <- apply(bfDF,1,function(x)
    c(t2bfInterval(t=x[tcol], n=x[ncol], mu=c(-0.01,0.01)),
      t2bfInterval(t=x[tcol], n=x[ncol], mu=c(-0.1,0.1))
  ))

setNames(cbind(bfDF, t(other)),  c('t', 'n', 'null.01', 'alt.01', 'bfInt.01', 'null.1', 'alt.1', 'bfInt.1'))
richarddmorey commented 1 year ago

I've pushed the commit that makes the invocation of the approximation much rarer. The new code only uses the approximation as a fallback if the quadrature fails. I agree in principle that the a smoother transition to the approximation is preferable, but that would be a bit tricky given that you'd need to work out, e.g., the closest t statistic where the quadrature doesn't fail, which would be different for different Ns and r scales. Perhaps the answer is just a better approximation.

jhmaindonald commented 1 year ago

The approximation does indeed now seem to be invoked only in quite extreme cases.
Note, however, that there can now be an issue for mu=0.

BFInfo()
Package BayesFactor
0.9.12-4.5 
t2bfInterval <- function(t, n=10, rscale="medium", mu=c(-.1,.1)){
    null0 <- BayesFactor::ttest.tstat(t=t, n1=n, nullInterval=mu,
                                      rscale=rscale,simple=TRUE)
    alt0 <- BayesFactor::ttest.tstat(t=t, n1=n, nullInterval=mu, rscale=rscale, 
                                     complement=TRUE, simple=TRUE)
    alt0/null0
}
##
## Calculate Bayes factors
pval <- c(1e-7,1e-8,1e-12); nval <- c(3,4)
bfDF <- expand.grid(p=pval, n=nval)
pcol <- 1; ncol <- 2; tcol <- 3
bfDF[,'t'] <- apply(bfDF,1,function(x){qt(x[pcol]/2, df=x[ncol]-1,                                  lower.tail=FALSE)})
other <- apply(bfDF,1,function(x)
    BayesFactor::ttest.tstat(t=x[tcol], n1=x[ncol], rscale="medium",
                             simple=TRUE))
bfDF <- cbind(bfDF, BF=other)
Error in integrate(meta.t.like, lower = (lower - mean.delta)/scale.delta,  : 
  roundoff error is detected in the extrapolation table
Error in integrate(meta.t.like, lower = (lower - mean.delta)/scale.delta,  : 
  the integral is probably divergent
Error in integrate(meta.t.like, lower = (lower - mean.delta)/scale.delta,  : 
  roundoff error is detected in the extrapolation table
 round(setNames(cbind(bfDF, t(other)), c('p','n','t','bf','bfInterval')),1)
  p n       t       bf bfInterval       NA        NA
round(bfDF,1)
  p n         t       BF
1 0 3    3162.3  16139.8
2 0 3   10000.0       NA
3 0 3 1000000.0       NA
4 0 4     280.4  18153.0
5 0 4     604.2 114927.2
6 0 4   13016.4       NA

Certainly, there is an issue of how meaningful a Bayes Factor is for such large values of t!

The code is not easy to find one's way around. A note on ways into the code for anyone who wants to experiment with the settings would be helpful.