mjskay / tidybayes

Bayesian analysis + tidy data + geoms (R package)
http://mjskay.github.io/tidybayes
GNU General Public License v3.0
719 stars 59 forks source link

mean_qi comparison to built in brms function #88

Closed tmastny closed 6 years ago

tmastny commented 6 years ago

When estimating parameter means using spread_samples and mean_qi I find that I get different results compared to the base brms functions.

Considering that the estimates should be calculated from the same set of Stan samples, I found the difference surprising. As you will see below, the difference is more than just a few decimal places, but the estimates are still roughly the same.

This might ultimately be a brms question, but I thought I'd post here and see if you had any ideas.

library(rethinking)
library(brms)
library(tidybayes)
library(tidyverse)

data("chimpanzees")
d <- chimpanzees
d$recipient <- NULL

# this model doesn't take long to fit
mod <- brm( pulled_left ~ 1 + (1 | actor) +
                      prosoc_left*condition - condition,
                    data = d, family = bernoulli(),
                    prior = c(set_prior("normal(0,10)", class = 'Intercept'),
                              set_prior("normal(0,10)", class = 'b'),
                              set_prior("cauchy(0,1)", class = 'sd',
                                        group = 'actor')))
# base brm parameter estimates
coef(mod)$actor[,,'Intercept']

#     Estimate Est.Error    2.5%ile   97.5%ile
# 1 -0.7151930 0.2710388 -1.2610062 -0.2096185
# 2  4.6289749 1.6508541  2.5556927  8.7364989
# 3 -1.0254793 0.2786557 -1.5779986 -0.4861515
# 4 -1.0184635 0.2739890 -1.5746687 -0.4992832
# 5 -0.7134285 0.2622040 -1.2502631 -0.2140437
# 6  0.2272997 0.2732106 -0.2982452  0.7733302
# 7  1.7638911 0.3983002  1.0368972  2.6044048

# tidybayes estimates
mod %>% spread_samples(r_actor[actor,]) %>% mean_qi()

# # A tibble: 7 x 5
# # Groups:   actor [7]
#   actor   r_actor   conf.low conf.high .prob
#   <int>     <dbl>      <dbl>     <dbl> <dbl>
# 1     1 -1.145886 -3.1283477 0.8790804  0.95
# 2     2  4.198282  1.6804917 8.5529938  0.95
# 3     3 -1.456172 -3.4376631 0.6182389  0.95
# 4     4 -1.449156 -3.4842968 0.5483304  0.95
# 5     5 -1.144121 -3.1296284 0.8784287  0.95
# 6     6 -0.203393 -2.2377362 1.7936313  0.95
# 7     7  1.333198 -0.6429028 3.4045263  0.95
mjskay commented 6 years ago

Good question! I believe The difference is that coef.brmsfit gives the conditional intercept (here, each actor's intercept, or as the documentation states, "the sum of population-level effects and corresponding group-level effects"), not just the coefficient of the random effect (here, the difference between each actor's intercept and the global intercept). The function ranef.brmsfit will give you the latter. So this:

ranef(mod)$actor
, , Intercept

    Estimate Est.Error   2.5%ile  97.5%ile
1 -1.1721686 0.9636567 -3.184252 0.5813610
2  4.1523900 1.6824437  1.766955 8.2339424
3 -1.4869005 0.9536372 -3.485597 0.2989344
4 -1.4736879 0.9541174 -3.487462 0.2552544
5 -1.1714235 0.9624689 -3.203741 0.5853726
6 -0.2388773 0.9539285 -2.274339 1.5177105
7  1.3017705 0.9833897 -0.754650 3.1563552

Is equivalent to this:

mod %>% 
  spread_samples(r_actor[actor,]) %>% 
  mean_qi()
# A tibble: 7 x 5
# Groups:   actor [7]
  actor    r_actor  conf.low conf.high .prob
  <int>      <dbl>     <dbl>     <dbl> <dbl>
1     1 -1.1721686 -3.184252 0.5813610  0.95
2     2  4.1523900  1.766955 8.2339424  0.95
3     3 -1.4869005 -3.485597 0.2989344  0.95
4     4 -1.4736879 -3.487462 0.2552544  0.95
5     5 -1.1714235 -3.203741 0.5853726  0.95
6     6 -0.2388773 -2.274339 1.5177105  0.95
7     7  1.3017705 -0.754650 3.1563552  0.95

While this:

coef(mod)$actor[,,"Intercept"]
    Estimate Est.Error    2.5%ile   97.5%ile
1 -0.7140058 0.2757255 -1.2552418 -0.1801244
2  4.6105528 1.6082396  2.5738480  8.7104328
3 -1.0287377 0.2819350 -1.5985323 -0.4864863
4 -1.0155251 0.2758217 -1.5446010 -0.4920529
5 -0.7132607 0.2657916 -1.2173259 -0.1953100
6  0.2192855 0.2691672 -0.2972306  0.7769189
7  1.7599333 0.3814252  1.0524398  2.5384523

Is equivalent to this:

mod %>%
  spread_samples(r_actor[actor,], b_Intercept) %>%
  mean_qi(r_actor + b_Intercept)
# A tibble: 7 x 5
# Groups:   actor [7]
  actor `r_actor + b_Intercept`   conf.low  conf.high .prob
  <int>                   <dbl>      <dbl>      <dbl> <dbl>
1     1              -0.7140058 -1.2552418 -0.1801244  0.95
2     2               4.6105528  2.5738480  8.7104328  0.95
3     3              -1.0287377 -1.5985323 -0.4864863  0.95
4     4              -1.0155251 -1.5446010 -0.4920529  0.95
5     5              -0.7132607 -1.2173259 -0.1953100  0.95
6     6               0.2192855 -0.2972306  0.7769189  0.95
7     7               1.7599333  1.0524398  2.5384523  0.95
tmastny commented 6 years ago

Thanks, I think this actually helps to clarify brms multilevel models for me.

For example (sorry, it doesn't seem possible to render LaTeX on a github comment), we could specify a multilevel model for varying (random) intercepts as follows:

           y_i ~ Bernoulli(p)
      logit(p) ~ a_{actor_i}
   a_{actor_i} ~ Normal(a, s)
             a ~ Normal(0, 1)
             s ~ Cauchy(0,1)

However, it seems like brms is doing the following:

           y_i ~ Bernoulli(p)
      logit(p) ~ a + a_{actor_i}
     a_{actor} ~ Normal(0, s)
             a ~ Normal(0, 1)
             s ~ Cauchy(0,1)

where each group level intercept is a deviation from a, based on the fact that coef is the sum of a and a_{actor_i}. Does that make sense? I couldn't find that explicitly in the documentation.

Also, neat trick with mean_qi(r_actor + b_Intercept)!

Edit:

I tested out this theory by explicitly removing the population intercept.

mod.test <- brm( pulled_left ~ -1 + (1 | actor) +
              prosoc_left*condition - condition,
            data = d, family = bernoulli(),
            prior = c(set_prior("normal(0,10)", class = 'b'),
                      set_prior("cauchy(0,1)", class = 'sd',
                                group = 'actor')))

coef(mod.test)$actor[,,'Intercept']
ranef(mod.test)$actor[,,'Intercept']

Both coef and ranef give the same results, and if brms is working as expected coef(mod)$actor[,,'Intercept'] of the previous model should also give the same answer. It turns out the results are similar, but slightly different. It is possible the difference arises because the models fit differently in Stan and have different samples.

This definitely seems like a brms question at this point, so I'll keep exploring. Thanks again.

mjskay commented 6 years ago

While I don't recall seeing anything specific on this in the brms docs either, the Stan manual discusses this kind of reparameterization in the section on "centered" versus "non-centered" parameterizations for efficiency in hierarchical models, also taking it a step further by separating out the scale as well as the location parameters --- I imagine brms is doing the same thing under the hood. I think you could output the stan code for the model and inspect it to verify this.

Also, neat trick with mean_qi(r_actor + b_Intercept)!

Thanks! :) Most of tidybayes now uses the tidyeval framework, enabling this sort of thing. This also means inline renaming of the output column works sort of like the way dplyr::mutate works, e.g.:

mod %>%
  spread_samples(r_actor[actor,], b_Intercept) %>%
  mean_qi(estimate = r_actor + b_Intercept)
# A tibble: 7 x 5
# Groups:   actor [7]
  actor   estimate   conf.low  conf.high .prob
  <int>      <dbl>      <dbl>      <dbl> <dbl>
1     1 -0.7140058 -1.2552418 -0.1801244  0.95
2     2  4.6105528  2.5738480  8.7104328  0.95
3     3 -1.0287377 -1.5985323 -0.4864863  0.95
4     4 -1.0155251 -1.5446010 -0.4920529  0.95
5     5 -0.7132607 -1.2173259 -0.1953100  0.95
6     6  0.2192855 -0.2972306  0.7769189  0.95
7     7  1.7599333  1.0524398  2.5384523  0.95