tidymodels / multilevelmod

Parsnip wrappers for mixed-level and hierarchical models
https://multilevelmod.tidymodels.org/
Other
74 stars 3 forks source link

extracting parameter sets of MCMC iterations from `model_fit` object #48

Closed joshyam-k closed 1 year ago

joshyam-k commented 1 year ago

This is a really fantastic package! When building bayesian multilevel models I often want to generate posterior predictive distributions that you can only create when you have access to all of the parameter sets from the MCMC iterations. I may just not be able to find how to do this, but if it isn't currently possible, it would be hugely helpful to be able to extract these when using this package to build the models.

Since the multilevelmod uses the engine stan_glm I'll show how you can get these parameter sets when building these models strictly with rstanarm::stan_glmer().

These data and code come from Chapter 17 of the BayesRules! book.

library(tidymodels)
library(tidyverse)
library(multilevelmod)
library(bayesrules)
library(rstanarm)

data(cherry_blossom_sample)
running <- cherry_blossom_sample

running_model_1 <- stan_glmer(
  net ~ age + (1 | runner), 
  data = running, family = gaussian,
  prior_intercept = normal(100, 10),
  prior = normal(2.5, 1), 
  prior_aux = exponential(1, autoscale = TRUE),
  prior_covariance = decov(reg = 1, conc = 1, shape = 1, scale = 1),
  chains = 4, iter = 5000*2, seed = 84735)

simply running

as.data.frame(running_model_1)

gives us the parameter sets for each final MCMC iteration.

So sorry if there is already a function that does this and I just couldn't find it!

hfrick commented 1 year ago

You can access the fitted stanreg model stored inside the parsnip model object with extract_fit_engine() and then call as.data.frame() on it like you did in your example 🙌

library(tidymodels)
library(multilevelmod)
library(bayesrules)
library(rstanarm)
#> [...]

data(cherry_blossom_sample)
running <- cherry_blossom_sample

running_model_1 <- stan_glmer(
  net ~ age + (1 | runner), 
  data = running, family = gaussian,
  prior_intercept = normal(100, 10),
  prior = normal(2.5, 1), 
  prior_aux = exponential(1, autoscale = TRUE),
  prior_covariance = decov(reg = 1, conc = 1, shape = 1, scale = 1),
  chains = 4, iter = 5000*2, seed = 84735)
#> [...]
coef_1 <- as.data.frame(running_model_1)

# with multilevelmod
running_model_2 <- linear_reg() %>% 
  set_engine("stan_glmer", family = gaussian,
             prior_intercept = normal(100, 10),
             prior = normal(2.5, 1), 
             prior_aux = exponential(1, autoscale = TRUE),
             prior_covariance = decov(reg = 1, conc = 1, shape = 1, scale = 1),
             chains = 4, iter = 5000*2, seed = 84735) %>% 
  fit(net ~ age + (1 | runner), data = running)

coef_2 <- running_model_2 %>% 
  extract_fit_engine() %>% 
  as.data.frame()

waldo::compare(coef_1, coef_2)
#> ✔ No differences

Created on 2022-11-14 with reprex v2.0.2

joshyam-k commented 1 year ago

ah, amazing!!! thank you so much!

github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.