quentingronau / bridgesampling

R package for bridge sampling
32 stars 16 forks source link

Reliability of running multiple repetitions of bridge_sampler when logml cannot be determined within maxiter? #35

Open mawilson1234 opened 1 year ago

mawilson1234 commented 1 year ago

I'm fitting some rather complicated models using brms, and have been finding difficulty getting bridge_sampler to run without getting the warning message about being unable to determine logml within maxiter. The models involve measurement error terms, which seems to be making it (understandably) difficult to achieve a good estimate. Even saving all the parameters (including the latent draws), 90k samples/chain has proved insufficient to remove the warning. These models are getting quite bulky, and while I have access to a cluster with lots of RAM to run them, I'm wondering if increasing the number of samples is really feasible past a certain point. I would still like to compute the logml for use in estimating the Bayes factors for model comparisons if possible, though, since a reviewer requested it.

Based on a response I got here from Ed Merkle, it sounds like one possible thought is to run bridge_sampler with >1 repetition (I was thinking about 100 as a starting point, and maybe more depending on the variance of the resulting distribution) to get a distribution of the logmls, and then estimate the Bayes factors using that. However, I'm not sure whether this will actually address the concern, so he suggested I ask the bridgesampling developers directly.

Put simply, if logml can't be estimated within maxiter, will the median of a distribution of logmls produced by bridge_sampler with multiple repetitions converge to a good estimate of the "true" estimated logml, or not?

mawilson1234 commented 1 year ago

I ran some tests on a smaller model and it does appear that running more repetitions when logml cannot be determined within maxiter is not a workaround; the distributions do not converge to the same values.

I ran one model with 20k samples/chain (low enough that logml could not be determined within maxiter), and one model with 40k samples/chain (high enough that it could) for 100 repetitions. Here are the results of a simple t.test on the logmls:

        Welch Two Sample t-test

data:
logml20k and logml40k
t = -98.946, df = 152.32, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -31.03350 -29.81846
sample estimates:
mean of x mean of y
-23514.54 -23484.11

The 20k model has a significantly lower logml than the 40k model. There's also noticeably more variability in the 20k model compared to the 40k model (as the warning message advises):

> quantile(logml20k)
       0%       25%       50%       75%      100%
-23520.55 -23516.37 -23514.78 -23513.09 -23504.69

> quantile(logml40k)
       0%       25%       50%       75%      100%
-23487.10 -23485.22 -23484.16 -23483.04 -23480.13

dists

Thankfully I was able to get the actual model to work with 100k samples/chain, so I think I'll stick to that for more reliable estimates.