EmmanuelCharpentier commented 5 years ago

The problem

Let's start with a simple-exemple: a one-way ANOVA, aiming to assess the differences of means between three groups:

G	X
A	-0.46
A	-0.567
A	0.16
A	1.16
B	1.245
B	1.17
B	1.129
B	1.947
C	2.784
C	2.73
C	2.886
C	2.798

The frequentist analysis of these data is easy: we can fit a linear model of these data, and "test" for the existence of a Group effect (here by a likelihood ratio test):

LM <- lm(X~G, data=Data)
drop1(LM, test="Chisq")


Single term deletions

Model:
X ~ G
       Df Sum of Sq     RSS      AIC  Pr(>Chi)    
<
              2.3432 -13.6007              
G       2    14.879 17.2219   6.3353 6.344e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The intergroup comparisons are slightly more intricate, but pose no real problem:

library(multcomp)
summary(glht(LM, linfct=mcp(G="Tukey")))


   Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts

Fit: lm(formula = X ~ G, data = Data)

Linear Hypotheses:
           Estimate Std. Error t value Pr(>|t|)    
B - A == 0   1.2997     0.3608   3.602  0.01418 *  
C - A == 0   2.7265     0.3608   7.557  < 0.001 ***
C - B == 0   1.4269     0.3608   3.955  0.00801 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

In both cases, the hypothesis tests allow us to reject the "null hypotheses" at the sacrosanct type I error rate of 5%, but do not tell us what degree of support the data provide against this hypothesis.

A Bayesian analysis of the same data show us easily that the support for intergroup differences may be scant:

library(brms)
BLM <- brm(X~G, data=Data,family=gaussian, prior=prior(student_t(3,0,10)),
     sample_prior="yes", save_all_pars=TRUE, seed=1723,)

hypothesis(BLM,c("GB=0", "GC=0", "(GC-GB)=0"))


Hypothesis Tests for class b:
     Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1      (GB) = 0     1.28      0.43     0.44     2.15       0.37      0.27    *
2      (GC) = 0     2.71      0.43     1.86     3.54       0.00      0.00    *
3 ((GC-GB)) = 0     1.43      0.44     0.56     2.32       0.50      0.33    *
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

In this example, only the difference between means of group A and C can be taken seriously. The other two have only very weak support from data.

We would like to get the same assessment for the "factor" test, answering trhe following question:

"How well do the data support the hyothesis of an effect of the group factor ?"

Currently, the only way is to refit a reduced model and compare it to the original model:

BLM0 <- update(BLM, .~.-G, seed=1723)
BLM <- add_criterion(BLM, criterion = c("loo", "waic", "marglik", "R2"),
           overwrite=TRUE, reloo=TRUE, silent=TRUE,
           open_progress=FALSE, seed=1723, model_name="BLM")
BLM0 <- add_criterion(BLM0, criterion = c("loo", "waic", "marglik", "R2"),
            overwrite=TRUE, reloo=TRUE, silent=TRUE,
            open_progress=FALSE, seed=1723, model_name="BLM0")

round(sapply(c("loo", "loo2", "waic", "marglik"),
       function(w) model_weights(BLM, BLM0, weights=w, reloo=TRUE)),
      4)

No problematic observations found. Returning the original 'loo' object.
No problematic observations found. Returning the original 'loo' object.
        loo   loo2   waic marglik
BLM  0.9996 0.9175 0.9999  0.9562
BLM0 0.0004 0.0825 0.0001  0.0438
Warning message:
Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.

save.image("Propal.RData")

Modulo the (false-alarm) warnings this does what we want, and show that, in the present case, the data support to the hypothesis of no Group effect (i. the weight of BLM0) is not inconsiderable, and depends on the way you assess it. *

But obtaining this result is somewhat expensive. We have to:

Compute the relevant information criteria to the model
Update the latter by removing the Group factor
Update the reduced model

before comparing.

Proposal

In that specific case, we would get the same information by assessi g the degree of support given by the data to the vector hypothesis GB=0~$\wedge$~GC=0. Unless my understanding of the Savage-Dickey ratio is incorrect, the relevant Bayes Factor could be estimated by the ratio of the posterior and brior densities of the vector (GB, GC) at the point 0, 0).

What is needed to get this ratio?

we already have, by definition, a sample of GB and GC from their posterior joint distribution;
since we wanted to test point hypotheses about theqe parameters, we have also sampled from their priors;

All that is lacking is a way to:

build bidimensional density estimates (kernels), in a form allowing to
use it to estimate the prior and posterior densities at the point of interest.

More generally, we want to build and use p-dimensional kernels.

There is a lot of solutions in R to build bidimensional kernels; more general solutions are a bit scarcer. However, I am aware of at least one package (ks) that boasts its ability to build kernels of "any" dimension. I am also aware of various algorithms to create such kernels, notably fastKDE (O'Brien, 2016), whose reference implementation is in Python (and thus might require porting).

An interesting problem is to assess how much sample points we need to realistically asess the prior and posterior densities. Since the advent of the bootstrap, a good rule of thumb was that, for an uni-dimensional parameter, a sample size equivalent to 1000 independent samples was enough to quote an estimate of the density, the mean, the variance or of a quantile. What is the equivalent for bidimensional kernel ?

Is this problem worthy of consideration for inclusion in brms ?

paul-buerkner commented 5 years ago

I think this solution is too specific to be implemented in brms, especially because I am not sure how the solution scales with dimension. Estimating densities from samples, which we need for Bayes factors essentially, is problematic anyway and I would expect it to become even more complex for multivariate distributions. Unless I see evidence for the computational robustness of such methods (or diagnostics that can tell when they fail) I don't think it is a good idea to implement them.

I would recommend going the "expensive" route and fitting both models to then compare them using the various available approachs just as you showed above.

EmmanuelCharpentier commented 5 years ago

Your arguments are sound.

I do not (yet) know enough about the various ways of estimating a mutidimensinal kernel, but on the "number of points" front, my attemps aren't very encouraging...

Unless I'm stricken by a thunderbolt of inspiration (bloody unlikely...), I think that this issue can be closed.

The real difficulty is to wean medical journals' reviewers of their NHST habits...

paul-buerkner commented 5 years ago

Yes, this is indeed the harder problem... Closing this issue now.

paul-buerkner / brms

Suggestion (possibly stupid): support Savage-Dicker ratio estimation for vector point hypotheses #743

The problem

Proposal