Closed EmmanuelCharpentier closed 5 years ago
I think this solution is too specific to be implemented in brms, especially because I am not sure how the solution scales with dimension. Estimating densities from samples, which we need for Bayes factors essentially, is problematic anyway and I would expect it to become even more complex for multivariate distributions. Unless I see evidence for the computational robustness of such methods (or diagnostics that can tell when they fail) I don't think it is a good idea to implement them.
I would recommend going the "expensive" route and fitting both models to then compare them using the various available approachs just as you showed above.
Your arguments are sound.
I do not (yet) know enough about the various ways of estimating a mutidimensinal kernel, but on the "number of points" front, my attemps aren't very encouraging...
Unless I'm stricken by a thunderbolt of inspiration (bloody unlikely...), I think that this issue can be closed.
The real difficulty is to wean medical journals' reviewers of their NHST habits...
Yes, this is indeed the harder problem... Closing this issue now.
The problem
Let's start with a simple-exemple: a one-way ANOVA, aiming to assess the differences of means between three groups:
The frequentist analysis of these data is easy: we can fit a linear model of these data, and "test" for the existence of a
Group
effect (here by a likelihood ratio test):The intergroup comparisons are slightly more intricate, but pose no real problem:
In both cases, the hypothesis tests allow us to reject the "null hypotheses" at the sacrosanct type I error rate of 5%, but do not tell us what degree of support the data provide against this hypothesis.
A Bayesian analysis of the same data show us easily that the support for intergroup differences may be scant:
In this example, only the difference between means of group A and C can be taken seriously. The other two have only very weak support from data.
We would like to get the same assessment for the "factor" test, answering trhe following question:
Currently, the only way is to refit a reduced model and compare it to the original model:
Modulo the (false-alarm) warnings this does what we want, and show that, in the present case, the data support to the hypothesis of no
Group
effect (i. the weight ofBLM0
) is not inconsiderable, and depends on the way you assess it. *But obtaining this result is somewhat expensive. We have to:
Group
factorbefore comparing.
Proposal
In that specific case, we would get the same information by assessi g the degree of support given by the data to the vector hypothesis
GB=0~$\wedge$~GC=0
. Unless my understanding of the Savage-Dickey ratio is incorrect, the relevant Bayes Factor could be estimated by the ratio of the posterior and brior densities of the vector(GB, GC)
at the point0, 0)
.What is needed to get this ratio?
GB
andGC
from their posterior joint distribution;All that is lacking is a way to:
More generally, we want to build and use p-dimensional kernels.
There is a lot of solutions in
R
to build bidimensional kernels; more general solutions are a bit scarcer. However, I am aware of at least one package (ks
) that boasts its ability to build kernels of "any" dimension. I am also aware of various algorithms to create such kernels, notablyfastKDE
(O'Brien, 2016), whose reference implementation is in Python (and thus might require porting).An interesting problem is to assess how much sample points we need to realistically asess the prior and posterior densities. Since the advent of the bootstrap, a good rule of thumb was that, for an uni-dimensional parameter, a sample size equivalent to 1000 independent samples was enough to quote an estimate of the density, the mean, the variance or of a quantile. What is the equivalent for bidimensional kernel ?
Is this problem worthy of consideration for inclusion in
brms
?