mjskay / tidybayes

Bayesian analysis + tidy data + geoms (R package)
http://mjskay.github.io/tidybayes
GNU General Public License v3.0
726 stars 59 forks source link

Add population and group level parameter #158

Closed awellis closed 5 years ago

awellis commented 5 years ago

What is the obvious way to extract parameter estimates for each item of a grouping variable, if there are several population level parameters (and possibly interactions), and varying (group-level) effects? You explain how to do this for a varying intercept in your documentation, but it’s not clear to me how to generalise this.

For example (using brms), if my formula is

response ~ condition + (condition | subject)

the resulting variables will be

> get_variables(m)
 [1] "b_Intercept"                       
 [2] "b_conditionB"                      
 [3] "sd_subject__Intercept"             
 [4] "sd_subject__conditionB"            
 [5] "cor_subject__Intercept__conditionB"
 [6] "sigma"                             
 [7] "r_subject[F30,Intercept]"          
 [8] "r_subject[F71,Intercept]"          
 [9] "r_subject[F30,conditionB]"         
[10] "r_subject[F71,conditionB]”

(There are only two subjects)

I can get the draws

draws_var <- m %>%
    spread_draws(r_subject[subject,term]) %>% 
    spread(term, r_subject)

draws_pop <- m %>%
    spread_draws(b_Intercept, b_conditionB)

draws <- draws_var %>%
    group_by(subject) %>%
    left_join(draws_pop) %>% 
    mutate(alpha = b_Intercept + Intercept,
           beta = b_conditionB + conditionB)

but this seems seems a rather roundabout way of adding together the population and group level effects.

Is there some intended usage that I am missing?

Here is the R code (dataset makes no sense, just for illustration):

library(tidyverse)
library(modelr)
library(brms)
library(tidybayes)

n_obs <- 5
n_condition <- 2
n_subjects <- 2
sigma <- 0.5

make_subject_data <- function(n_condition = 2, n_obs = 5) {
    tibble(subject = paste0(sample(LETTERS, 1), floor(runif(1, 10, 100))),
           condition = rep(LETTERS[1:n_condition], n_obs),
           response = rnorm(n_obs * n_condition, c(1,-1), sigma))
}

set.seed(5)
AB <- bind_rows(replicate(n_subjects,
                           make_subject_data(n_condition, n_obs),
                           simplify = FALSE)) %>%
    mutate(subject = as.factor(subject),
           condition = as.factor(condition))

head(AB)

AB %>%
    ggplot(aes(y = condition, x = response)) +
    geom_point() +
    facet_wrap(~subject)

m <- brm(response ~ condition + (condition|subject),
         data = AB)

draws_var <- m %>%
    spread_draws(r_subject[subject,term]) %>% 
    spread(term, r_subject)

draws_pop <- m %>%
    spread_draws(b_Intercept, b_conditionB)

draws <- draws_var %>%
    group_by(subject) %>%
    left_join(draws_pop) %>% 
    mutate(alpha = b_Intercept + Intercept,
           beta = b_conditionB + conditionB)

This is just an example, what I am really trying to do is to estimate d’ (signal detection) for each individual subject in a multilevel model.

Best wishes, Andrew

mjskay commented 5 years ago

There are a few directions for simplification. The first thing to note is that this:

draws_var <- m %>%
    spread_draws(r_subject[subject,term]) %>% 
    spread(term, r_subject)

Is equivalent to using | within spread_draws:

draws_var <- m %>%
    spread_draws(r_subject[subject,term] | term)

Then the other thing is that spread_draws will do the join for you as long as you include all the variables you want in a single call, so you can combine the entire definition of draws plus the mutate into this:

draws <- m %>%
  spread_draws(b_Intercept, b_conditionB, r_subject[subject,term] | term) %>%
  mutate(
    alpha = b_Intercept + Intercept,
    beta = b_conditionB + conditionB
  )

That said, personally I probably wouldn't do this by munging parameters manually, since that tends to get messy when you do things like add more conditions. Instead, since what you need to derive alpha and beta are means conditional on subject and condition, I would consider using modelr::data_grid combined with add_fitted_draws to get the conditional means and go from there.

E.g. this:

AB %>%
  modelr::data_grid(condition, subject) %>%
  add_fitted_draws(m)

Which will give you a column .value containing draws from the mean conditional on condition and subject. Then alpha and beta are something like (ungrouping and dropping .row are needed because otherwise the spread breaks):

AB %>%
  modelr::data_grid(condition, subject) %>%
  add_fitted_draws(m) %>%
  ungroup() %>%
  select(-.row) %>%
  spread(condition, .value) %>%
  mutate(
    alpha = A,
    beta = B - A
  )

Does that help?

awellis commented 5 years ago

Hi, thanks for your reply, it's very helpful, and thanks in general for this package.

I haven't had much time to try this out, as in the end, I used brms::coef() to extract the population + group level parameter estimates. I will try this out and get back to you.

mjskay commented 5 years ago

Great, thanks! I'll close this for now, but please do let me know if you end up trying out this (or anything else) and it doesn't do what you need.