stan-dev / bayesplot

bayesplot R package for plotting Bayesian models
https://mc-stan.org/bayesplot
GNU General Public License v3.0
432 stars 82 forks source link

Feature request: multiple grouping variables #279

Open okenk opened 2 years ago

okenk commented 2 years ago

Hi, thanks so much for the package, it's a huge help! I am cross-posting this issue on the stan forums in case there is a solution I am missing. Essentially I am trying to visualize the posterior predictive distributions across two grouping variables instead of one. I am using bayesplot::ppc_violin_grouped() which is working like a charm for the first grouping variable, but then I'd like to facet by the second grouping variable.

I thought I might be able to just add this to an existing plot using facet_wrap(). I looked at the data component of the bayesplot object and first tried to emulate that. However, I got an error that the length was wrong. If I then change the length of the facet variable to what the error asks for, it then wants a different length. Minimally reproducible example:

mod <- brms::brm(mpg ~ cyl + gear, data = mtcars, iter = 200)
ppc <- brms::posterior_predict(mod)
base.plot <- bayesplot::ppc_violin_grouped(y = mtcars$mpg, yrep = ppc, group = mtcars$gear) 

# try to replicate structure of base.plot$data
base.plot +
  facet_wrap(~c(rep(mtcars$cyl, each = 400), mtcars$cyl))

# change based on resulting error
base.plot +
  facet_wrap(~rep(mtcars$cyl, each = 400))

# change based on resulting error
base.plot +
  facet_wrap(~mtcars$cyl)
jgabry commented 2 years ago

Yeah unfortunately it's not straightforward to add a second grouping variable right now, but this is definitely a feature I'd like to add at some point. Until then, I think you can kind of hack it. Here's one way:

library(bayesplot)
library(dplyr)  # will use %>% and left_join

# - use the ppc_data function to get the data baseplot used internally (this is different than the data 
# constructed by ggplot2 in base.plot$data)
# - add the second grouping variable (in this case cyl) and join by y_id
# - in the new data frame plot_data now has a variable "group2" now (it already had a variable "group")
plot_data <- 
  ppc_data(y = mtcars$mpg, yrep = ppc, group = mtcars$gear) %>%
  left_join(data.frame(group2 = mtcars$cyl, y_id = 1:nrow(mtcars)), by = "y_id")

head(plot_data)
# A tibble: 6 × 9
  group  y_id y_name rep_id rep_label            is_y  is_y_label     value group2
  <fct> <int> <fct>   <int> <fct>                <lgl> <fct>          <dbl>  <dbl>
1 4         1 1           1 italic(y)[rep] ( 1 ) FALSE italic(y)[rep]  21.4      6
2 4         1 1           2 italic(y)[rep] ( 2 ) FALSE italic(y)[rep]  17.3      6
3 4         1 1           3 italic(y)[rep] ( 3 ) FALSE italic(y)[rep]  18.3      6
4 4         1 1           4 italic(y)[rep] ( 4 ) FALSE italic(y)[rep]  18.2      6
5 4         1 1           5 italic(y)[rep] ( 5 ) FALSE italic(y)[rep]  22.2      6
6 4         1 1           6 italic(y)[rep] ( 6 ) FALSE italic(y)[rep]  20.6      6

Now we can replace the data in base.plot with plot_data and then you should be able to use facet_wrap or facet_grid with two variables:

# - use %+% to replace the data in base.plot with plot_data
# - then add facet_grid or facet_wrap (I'm not sure which you want here, maybe facet_grid actually)
base.plot %+% plot_data + facet_grid(c("group", "group2"))

All together the code is:

plot_data <- 
  ppc_data(y = mtcars$mpg, yrep = ppc, group = mtcars$gear) %>%
  left_join(data.frame(group2 = mtcars$cyl, y_id = 1:nrow(mtcars)), by = "y_id")
base.plot %+% plot_data + facet_wrap(c("group", "group2"))

There may be a better way to do this, but I think this should at least work. In the future hopefully we can add an argument to the plotting function for a second grouping variable or allow a different way to specify multiple grouping variables.

okenk commented 2 years ago

Thanks! Would be a cool feature for diagnosing model misspecifications, but in the meantime this worked.

shirdekel commented 1 year ago

@jgabry How would you do the same but for ppc_bars_grouped? I'm struggling to join the second grouping variable because I can't get an ID for each rep.