mjskay / tidybayes

Bayesian analysis + tidy data + geoms (R package)
http://mjskay.github.io/tidybayes
GNU General Public License v3.0
712 stars 59 forks source link

code example for multiple variables #225

Closed swood-ecology closed 3 years ago

swood-ecology commented 4 years ago

It would be great to have an example in the vignette of how to use tidybayes to generate a coefficient plot for multiple parameters, like the dotwhisker vignette: https://cran.r-project.org/web/packages/dotwhisker/vignettes/dotwhisker-vignette.html.

mjskay commented 4 years ago

Do you mean something like the example with subgroups + dodging here? https://mjskay.github.io/tidybayes/articles/slabinterval.html#on-sample-data-stat_gradientintervalh

Would have to think about where / how that could be integrated into the larger vignettes...

swood-ecology commented 4 years ago

What seems to me to be different about the stat_gradientintervalh example is that this is showing how a variable y differs by a group. In my understanding, this would look in lmer syntax like y ~ x + (1|g).

What I was wondering about is how to plot side-by-side the multiple x's in a model of this form: y ~ x1 + x2 + x3. In general, I can convert a wide format data object to something that's long and grouped that would fit generally with the stat_gradientintervalh example. But I'm not sure how to do that for an rStan object.

mjskay commented 4 years ago

Makes sense --- given an rstan object, you could start by using tidy_draws(), spread_draws(), or gather_draws() to get a data frame with the variables you desire (and with supported models like rstanarm and brms you can also use add_fitted_draws() or add_predicted_draws() to get conditional means or posterior predictions), then pipe those into ggplot calls with the slabinterval geoms. There are examples of that kind of thing in the tidy-brms and tidy-rstanarm vignettes.

E.g. given a model like this:

library(rstanarm)

m = stan_glm(hp ~ mpg + cyl + am, data = mtcars)

You can see what variables it has:

get_variables(m)
 [1] "(Intercept)"   "mpg"           "cyl"           "am"            "sigma"         "accept_stat__" "stepsize__"    "treedepth__"   "n_leapfrog__"  "divergent__"   "energy__" 

and pull out variables into a long format using gather_draws():

m %>%
  gather_draws(mpg, cyl, am)
# A tibble: 12,000 x 5
# Groups:   .variable [3]
   .chain .iteration .draw .variable .value
    <int>      <int> <int> <chr>      <dbl>
 1      1          1     1 mpg        -6.53
 2      1          2     2 mpg        -4.62
 3      1          3     3 mpg        -2.72
 4      1          4     4 mpg        -4.60
 5      1          5     5 mpg        -6.33
 6      1          6     6 mpg        -3.21
 7      1          7     7 mpg        -2.24
 8      1          8     8 mpg        -5.28
 9      1          9     9 mpg        -5.80
10      1         10    10 mpg        -4.03
# ... with 11,990 more rows

Which you can pipe directly into ggplot and use with the slabinterval geoms:

m %>%
  gather_draws(mpg, cyl, am) %>%
  ggplot(aes(y = .variable, x = .value)) +
  stat_halfeyeh()

image

Does that help?

mjskay commented 3 years ago

Closing this issue as old. Feel free to reopen if you still need help.