mjskay / tidybayes

Bayesian analysis + tidy data + geoms (R package)
http://mjskay.github.io/tidybayes
GNU General Public License v3.0
712 stars 59 forks source link

compare_levels additional contrasts #272

Closed kevingoneill closed 3 years ago

kevingoneill commented 3 years ago

The existing contrasts available for compare_levels are really great and cover most applications, but every once in a while I feel myself missing the ability to do contrasts like contr.poly or contr.sum with tidybayes. Is there any chance of getting that to work? I'm aware that emmeans can let me do this for simpler models, but it'd be nice to keep the flexibility that tidybayes offers.

mjskay commented 3 years ago

Yeah it's a good question.

Just to make sure I understand the question correctly, are you talking about a situation where you have a tidy data frame of draws from conditional means, and you want what would have been coefficients from one of those contrast schemes? E.g., say you have draws from conditional means for levels a, b, and c; you want to be able to be able to do something like get what the coefficients would have been under a sum-to-zero contrast scheme? Or are you in a situation where you have coefficients from a model that were generated using an arbitrary contrast scheme, and you want to be able to generate comparisons of levels?

The solution to the latter might involve using the contrast matrices to generate conditional means before invoking compare_levels. The solution to the former would be a bit more involved, and might mean creating a new function (or at the very least changing around how compare_levels handles custom comparison functions a bit). That's a bit more of an undertaking unfortunately.

If you have a minimal example of your use case that would help me think more concretely about what an API for this could look like.

kevingoneill commented 3 years ago

Thanks for the help! Unfortunately I'm thinking more towards the first case, where I want to generate arbitrary contrasts using my tidy grid of conditional means. Basically, I want to be able to replicate the following emmeans functionality with tidybayes:

df <- data.frame(x=rep(c('A', 'B', 'C', 'D'), each=25), y=rnorm(100))

m <- brms::brm(y ~ x, df)
draws <- df %>% modelr::data_grid(x) %>%
    add_fitted_draws(m)

## Pairwise Contrasts
emmeans(m, pairwise ~ x)
draws %>% compare_levels(.value, x, comparison='pairwise') %>% median_hdi()

## Treatment Contrasts
emmeans(m, trt.vs.ctrl ~ x)
draws %>% compare_levels(.value, x, comparison='control') %>% median_hdi()

## Sum Contrasts
emmeans(m, eff ~ x)
draws %>% compare_levels(.value, x, comparison='???') %>% median_hdi()

## Polynomial Contrasts
emmeans(m, poly ~ x)
draws %>% compare_levels(.value, x, comparison='???') %>% median_hdi()

As you can see, emmeans and tidybayes give basically the same thing for the pairwise/treatment contrasts, though presumably they're pretty different under the hood. It looks like emmeans now has a lot better support for brms models than since the last time I checked, so it's probably fine just doing these with emmeans and gather_emmeans_draws.

mjskay commented 3 years ago

Ah okay. Turns out a temporary solution was not as hard as I thought. If you check out the latest dev vesion:

devtools::install_github("mjskay/tidybayes@dev")

Then you can define this function, which translates emmeans comparison functions into tidybayes comparison functions:

emmeans_comparison = function(comparison_name) {
  comp_fun = getFromNamespace(paste0(comparison_name, ".emmc"), "emmeans")
  function(var) {
    comp_matrix = comp_fun(as.character(unique(var)))
    var_names = rownames(comp_matrix)
    lapply(comp_matrix, function(coefs) {
      purrr::reduce(purrr::map2(coefs, var_names, function(coef, var_name) {
        call("*", coef, as.name(var_name))
      }), ~ call("+", .x, .y))
    })
  }
}

For example, this emmeans comparison matrix:

emmeans:::del.eff.emmc(c("a","b","c"))
  a effect b effect c effect
a      1.0     -0.5     -0.5
b     -0.5      1.0     -0.5
c     -0.5     -0.5      1.0

Becomes these expressions for tidybayes comparisons:

emmeans_comparison("del.eff")(c("a","b","c"))
$`a effect`
1 * a + -0.5 * b + -0.5 * c

$`b effect`
-0.5 * a + 1 * b + -0.5 * c

$`c effect`
-0.5 * a + -0.5 * b + 1 * c

So you can do this:

draws %>% compare_levels(.value, x, comparison = emmeans_comparison("del.eff")) %>% median_hdi()
# A tibble: 4 x 7
  x         .value .lower .upper .width .point .interval
  <chr>      <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1 A effect -0.409  -0.878 0.0416   0.95 median hdi      
2 B effect  0.0636 -0.399 0.503    0.95 median hdi      
3 C effect  0.165  -0.282 0.638    0.95 median hdi      
4 D effect  0.177  -0.299 0.628    0.95 median hdi 

I will add some variant of emmeans_comparison to the API (so you don't have to define it yourself), but I will have to think a bit more about what the API should look like for it.

kevingoneill commented 3 years ago

Wow, that looks great! Thanks for the speedy response!