mjskay / tidybayes

Bayesian analysis + tidy data + geoms (R package)
http://mjskay.github.io/tidybayes
GNU General Public License v3.0
718 stars 59 forks source link

spread_draws with nested random term #171

Closed liamkendall closed 5 years ago

liamkendall commented 5 years ago

Hi Michael,

I am trying to extract tidy draws from a brmsfit object with a nested random structure but it throws the following error.

Apologises in advance if I am doing something wrong here...

See example below. example.txt

data <- read.table("example.txt [example.txt](https://github.com/mjskay/tidybayes/files/3200065/example.txt) ") `fs.out.obs.mod <- brm(yi~1+(1|publ_short/spp_grp), family="gaussian", seed = 123, prior = meta_fs.osb.priors, control=list(adapt_delta=0.999,max_treedepth=15), cores=4, iter = 2000, data=data)

get_variables(fs.out.obs.mod)

fs.out.obs.mod%>% spread_draws(b_Intercept,r_publ_short[publ_short,],r_publ_short:spp_grp[publ_short:spp_grp,])

Error in r_publ_short:spp_grp[publ_short:spp_grp, ] : NA/NaN argument In addition: Warning messages: 1: In r_publ_short:spp_grp[publ_short:spp_grp, ] : numerical expression has 3 elements: only the first used 2: In r_publ_short:spp_grp[publ_short:spp_grp, ] : numerical expression has 3 elements: only the first used

mjskay commented 5 years ago

Ah yeah, the problem is that r_publ_short:spp_grp is not parsed as a single variable name but two variables with : in between them. If you put backticks around the names with : in them it should sort of work:

fs.out.obs.mod %>%
  spread_draws(b_Intercept, r_publ_short[publ_short,], `r_publ_short:spp_grp`[`publ_short:spp_grp`,])

However, this won't quite give you what I assume you want --- it is going to give you a column called publ_short for the index in the r_publ_short variable and one called publ_short:spp_grp for the index in the r_publ_short:spp_grp variable. But it won't be able to properly match up the coefficients because the r_publ_short:spp_grp index won't be broken in two, so instead you'll get a bunch of rows for all combinations of the coefficients, and you'd have to separate the r_publ_short:spp_grp column into two and manually filter out rows that don't correspond to actual combinations seen in the data.

Fortunately there's a relatively small tweak we can do to avoid all that. brms uses _ as the separator in column names for interactions. You can see this by inspecting the output of get_variables(), which lists all the variables you can use within a call to spread_draws():

> get_variables(fs.out.obs.mod)
 [1] "b_Intercept"                                          "sd_publ_short__Intercept"                            
 [3] "sd_publ_short:spp_grp__Intercept"                     "sigma"                                               
 [5] "r_publ_short[Bell.,Intercept]"                        "r_publ_short[Campbell,Intercept]"                    
 [7] "r_publ_short[Cauic,Intercept]"                        "r_publ_short[Hiwak,Intercept]"                       
 [9] "r_publ_short[Marti2,Intercept]"                       "r_publ_short[Nicod,Intercept]"                       
[11] "r_publ_short[Nunes,Intercept]"                        "r_publ_short[Rolda,Intercept]"                       
[13] "r_publ_short[Sabar,Intercept]"                        "r_publ_short[Wilka,Intercept]"                       
[15] "r_publ_short[Zhang,Intercept]"                        "r_publ_short:spp_grp[Bell._Other.bees,Intercept]"    
[17] "r_publ_short:spp_grp[Campbell_Bombus,Intercept]"      "r_publ_short:spp_grp[Cauic_Stingless.bees,Intercept]"
[19] "r_publ_short:spp_grp[Hiwak_Bombus,Intercept]"         "r_publ_short:spp_grp[Hiwak_Stingless.bees,Intercept]"
[21] "r_publ_short:spp_grp[Marti2_Bombus,Intercept]"        "r_publ_short:spp_grp[Nicod_Apis.mellifera,Intercept]"
[23] "r_publ_short:spp_grp[Nicod_Stingless.bees,Intercept]" "r_publ_short:spp_grp[Nunes_Stingless.bees,Intercept]"
[25] "r_publ_short:spp_grp[Rolda_Bombus,Intercept]"         "r_publ_short:spp_grp[Sabar_Apis.mellifera,Intercept]"
[27] "r_publ_short:spp_grp[Wilka_Other.bees,Intercept]"     "r_publ_short:spp_grp[Zhang_Apis.mellifera,Intercept]"
[29] "r_publ_short:spp_grp[Zhang_Bombus,Intercept]"         "lp__"                                                
[31] "accept_stat__"                                        "stepsize__"                                          
[33] "treedepth__"                                          "n_leapfrog__"                                        
[35] "divergent__"                                          "energy__"  

Since there aren't any underscores in your factor level names (yay!), there is an easy fix: you can adjust the separator spread_draws() uses to tell it to use commas and underscores to differentiate between indices. Then if you extract the first half of the index on r_publ_short:spp_grp into a column with the name as the index on r_publ_short, then spread_draws() should automatically make sure everything lines up properly:

fs.out.obs.mod %>%
  spread_draws(b_Intercept, r_publ_short[publ_short,], `r_publ_short:spp_grp`[publ_short, spp_grp, ], sep = "[,_ ]")

Let me know if that gives what you're looking for.

mjskay commented 5 years ago

Also, to demonstrate my point about things lining up sensibly, given the second version (with modified sep) you can calculate conditional means of combinations actually in the data straightforwardly:

fs.out.obs.mod %>%
  spread_draws(b_Intercept, r_publ_short[publ_short,], `r_publ_short:spp_grp`[publ_short, spp_grp,], sep = "[,_ ]") %>%
  mutate(mu = b_Intercept + r_publ_short + `r_publ_short:spp_grp`) %>%
  median_qi(mu)
# A tibble: 14 x 8
# Groups:   publ_short [11]
   publ_short spp_grp             mu  .lower .upper .width .point .interval
   <chr>      <chr>            <dbl>   <dbl>  <dbl>  <dbl> <chr>  <chr>    
 1 Bell.      Other.bees     -1.73   -2.03   -1.23    0.95 median qi       
 2 Campbell   Bombus          0.254  -0.0982  0.612   0.95 median qi       
 3 Cauic      Stingless.bees  0.455   0.0633  0.838   0.95 median qi       
 4 Hiwak      Bombus          2.03    1.64    2.31    0.95 median qi       
 5 Hiwak      Stingless.bees  2.08    1.71    2.35    0.95 median qi       
 6 Marti2     Bombus          0.630   0.355   0.870   0.95 median qi       
 7 Nicod      Apis.mellifera  0.247  -0.0424  0.531   0.95 median qi       
 8 Nicod      Stingless.bees  0.146  -0.0845  0.403   0.95 median qi       
 9 Nunes      Stingless.bees  0.808   0.425   1.16    0.95 median qi       
10 Rolda      Bombus          0.519   0.260   0.786   0.95 median qi       
11 Sabar      Apis.mellifera  0.103  -0.257   0.483   0.95 median qi       
12 Wilka      Other.bees     -0.0396 -0.393   0.343   0.95 median qi       
13 Zhang      Apis.mellifera  1.04    0.718   1.33    0.95 median qi       
14 Zhang      Bombus          1.13    0.796   1.42    0.95 median qi 

Though personally if that's what I was looking for I'd probably use add_fitted_draws() as it is a little more straightforward and doesn't depend on doing the coefficient math yourself (so if the model changes it still gives the correct output):

data %>%
  modelr::data_grid(nesting(publ_short, spp_grp)) %>%
  add_fitted_draws(fs.out.obs.mod) %>%
  median_qi(.value)
# A tibble: 14 x 9
# Groups:   publ_short, spp_grp [44]
   publ_short spp_grp         .row  .value  .lower .upper .width .point .interval
   <fct>      <fct>          <int>   <dbl>   <dbl>  <dbl>  <dbl> <chr>  <chr>    
 1 "Bell "    Other bees         1 -1.73   -2.03   -1.23    0.95 median qi       
 2 Campbell   Bombus             2  0.254  -0.0982  0.612   0.95 median qi       
 3 Cauic      Stingless bees     3  0.455   0.0633  0.838   0.95 median qi       
 4 Hiwak      Bombus             4  2.03    1.64    2.31    0.95 median qi       
 5 Hiwak      Stingless bees     5  2.08    1.71    2.35    0.95 median qi       
 6 Marti2     Bombus             6  0.630   0.355   0.870   0.95 median qi       
 7 Nicod      Apis mellifera     7  0.247  -0.0424  0.531   0.95 median qi       
 8 Nicod      Stingless bees     8  0.146  -0.0845  0.403   0.95 median qi       
 9 Nunes      Stingless bees     9  0.808   0.425   1.16    0.95 median qi       
10 Rolda      Bombus            10  0.519   0.260   0.786   0.95 median qi       
11 Sabar      Apis mellifera    11  0.103  -0.257   0.483   0.95 median qi       
12 Wilka      Other bees        12 -0.0396 -0.393   0.343   0.95 median qi       
13 Zhang      Apis mellifera    13  1.04    0.718   1.33    0.95 median qi       
14 Zhang      Bombus            14  1.13    0.796   1.42    0.95 median qi 
mjskay commented 5 years ago

Closing this as it looks resolved