stan-dev / loo

loo R package for approximate leave-one-out cross-validation (LOO-CV) and Pareto smoothed importance sampling (PSIS)
https://mc-stan.org/loo
Other
150 stars 34 forks source link

Would a model_avg function make sense? #82

Closed lukeholman closed 6 years ago

lukeholman commented 6 years ago

Thanks for an interesting and well-documented package!

So, I am new to Bayesian approaches. In the frequentist world, I did many analyses like this, namely specifying a set of plausible models, ranking them by their AIC values, and averaging the models with the appropriate weightings to obtain model-averaged parameter estimates, predictions etc.

full_model <- lm(y ~ x1 + x2)
aic_table <- MuMIn::dredge(full_model)
model_averaging_results <- MuMIn::model.avg(aic_table)

As I understand it, loo provides a way to estimate model weights, which are a lot like Akaike weights in terms of their interpretation (i.e. models with a weight near 1 are likely to be the 'best' model in the set) and intended use (i.e. the aim is then to average across models - NOT to simply pick a single top model). Hope that's right?

Assuming I understand correctly, would it make sense to add a convenience function with a similar aim to MuMIn::model.avg? Ideally, the new loo:model_avg function could be used like this (here I assume you again wrote it in a way that allows integration with brms, rstanarm etc):

model1 <- brm(y ~ x1 + x2)
model2 <- brm(y ~ x1)
loo_values <- loo_model_weights(model1, model2)
model_averaging_results <- model_avg(object = list(model1, model2), 
                                     weights = loo_values)
summary(model_averaging_results) # averaged posterior distributions for each parameter 
predict(model_averaging_results) # averaged predicted values, etc etc

Maybe the model averaging is so easy, or so case-specific, that it doesn't need a separate function? But if that's the case, some worked examples in the vignette would be really handy. As it stands, I am not really sure what to do with the LOO values calculated for my models!

Thanks

jgabry commented 6 years ago

Glad you like the package. I specifically appreciate that you recognized the documentation. We spent a lot of time on that!

As to model averaging, I think it’s a good idea to have that convenience function. brms has posterior_average, which we’ll be putting in rstantools and rstanarm soon. So indeed, we’re thinking along the same lines. Thanks for the suggestion!

yao-yl commented 6 years ago

loo provides a way to estimate model weights, which are a lot like Akaike weights in terms of their interpretation (i.e. models with a weight near 1 are likely to be the 'best' model in the set) and intended use (i.e. the aim is then to average across models - NOT to simply pick a single top model). Hope that's right?

Yes, this is indeed the advantage of loo stacking.

Currently we do not have a separate summary function like your description. One reason is we allow each model to have different parameter space.

avehtari commented 6 years ago

@lukeholman the specific examples you show do not usually need model averaging this way. What you seem to want is better handled by suitable prior (like horseshoe) and sampling. See first AIC solution https://atyre2.github.io/2017/06/16/rebutting_cade.html and then the Bayesian solution https://rawgit.com/avehtari/modelselection_tutorial/master/collinear.html

Model averaging using stacking or LOO weights is for cases, where it is not easy to have continuous version of the model space and we assume that all models are not well specified.

Let's continue discussion with @lukeholman in Stan forum, and we'll get back here when it's more clear what is needed.

jgabry commented 6 years ago

Actually I think instead of my previous suggestion of posterior_average() I prefer predictive_average() as it emphasizes that it’s the posterior predictive distribution that is averaged, not the posterior draws. @paul-buerkner?

paul-buerkner commented 6 years ago

In brms, posterior_average averages posterior distributions (i.e. takes samples from the posterior of each model based on the model weights), while pp_average averages posterior-predictive distributions.

avehtari commented 6 years ago

posterior_average averages posterior distributions (i.e. takes samples from the posterior of each model based on the model weights),

What is the use case for this? Different models have different model spaces or nonlinear mappings to interpretable scale and thus averaging of posterior distributions rarely makes sense (I can come up with some strangely constrained cases, but I would not advertise them).

paul-buerkner commented 6 years ago

I agree. It basically mirrors the functionality of other (frequentist) packages doing model averaging. Not sure it was a good idea implementing this in the first place...

lukeholman commented 6 years ago

Hi all,

Thanks very much for the help!

I spent yesterday doing a bunch more reading, and I think that either horseshoe priors or the pp_average() and posterior_average() functions in brms cover my needs just fine, and I'm not sure a new loo function is needed. I guess my only suggestion is to add a quick worked example to the end of the loo vignette, illustrating that one can easily use the computed weights to useful ends (in brms, and presumably also in rstanarm etc). For other people finding this later, I found the following posts especially informative:

http://mc-stan.org/projpred/articles/quickstart.html https://drewtyre.rbind.io/post/rebutting_cade/ https://rawgit.com/avehtari/modelselection_tutorial/master/collinear.html

Cheers

jgabry commented 6 years ago

Thanks @lukeholman. I just opened #83 to help address adding examples and automating the process in brms and rstanarm.