rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
341 stars 30 forks source link

Optional configurable delimiter in the column means of as.mcmc() output #425

Closed wlandau closed 1 year ago

wlandau commented 1 year ago

I am co-developing a modeling package on top of brms and emmeans. We love how as.mcmc(emmeans(object = brms_model)) just works and gives us posterior samples of the marginal means.

If possible, we would like to be able to choose the delimiter in the column names of the mcmc objects. From the code, it looks like emmeans only supports ", ".

https://github.com/rvlenth/emmeans/blob/bfe96688a0e7c967bf89f9922c7cf529261f5990/R/MCMC-support.R#L163

This is fine for almost all scenarios, but just in case the factor levels in the data also have commas, it would be great to choose the delimiter ourselves.

Reprex:

library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.19.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar
library(coda)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(emmeans)
library(tidyr)
data <- expand_grid(
  expand_grid(
    factor1 = paste0("level_", letters[seq_len(4L)]),
    factor2 = paste0("level_", LETTERS[seq_len(4L)])
  ),
  observation = seq_len(100L)
) %>%
  mutate(response = rnorm(n = n()))

data
#> # A tibble: 1,600 × 4
#>    factor1 factor2 observation response
#>    <chr>   <chr>         <int>    <dbl>
#>  1 level_a level_A           1   0.551 
#>  2 level_a level_A           2   0.948 
#>  3 level_a level_A           3   0.255 
#>  4 level_a level_A           4   0.111 
#>  5 level_a level_A           5  -0.113 
#>  6 level_a level_A           6   0.401 
#>  7 level_a level_A           7  -0.0605
#>  8 level_a level_A           8   0.920 
#>  9 level_a level_A           9  -0.542 
#> 10 level_a level_A          10  -0.378 
#> # ℹ 1,590 more rows

formula <- brmsformula(response ~ factor1 * factor2)
model <- brm(formula = formula, data = data, refresh = 0L)
#> Compiling Stan program...
#> Start sampling
means <- emmeans(object = model, specs = ~factor1:factor2)
mcmc <- as.mcmc(means, fixed = TRUE, names = FALSE)

colnames(mcmc[[1L]])
#>  [1] "level_a, level_A" "level_b, level_A" "level_c, level_A" "level_d, level_A"
#>  [5] "level_a, level_B" "level_b, level_B" "level_c, level_B" "level_d, level_B"
#>  [9] "level_a, level_C" "level_b, level_C" "level_c, level_C" "level_d, level_C"
#> [13] "level_a, level_D" "level_b, level_D" "level_c, level_D" "level_d, level_D"

Created on 2023-06-06 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.0 (2023-04-21) #> os macOS Ventura 13.4 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Indiana/Indianapolis #> date 2023-06-06 #> pandoc 3.1.2 @ /usr/local/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> abind 1.4-5 2016-07-21 [1] CRAN (R 4.3.0) #> backports 1.4.1 2021-12-13 [1] CRAN (R 4.3.0) #> base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.3.0) #> bayesplot 1.10.0 2022-11-16 [1] CRAN (R 4.3.0) #> bridgesampling 1.1-2 2021-04-16 [1] CRAN (R 4.3.0) #> brms * 2.19.0 2023-03-14 [1] CRAN (R 4.3.0) #> Brobdingnag 1.2-9 2022-10-19 [1] CRAN (R 4.3.0) #> callr 3.7.3 2022-11-02 [1] CRAN (R 4.3.0) #> checkmate 2.2.0 2023-04-27 [1] CRAN (R 4.3.0) #> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0) #> coda * 0.19-4 2020-09-30 [1] CRAN (R 4.3.0) #> codetools 0.2-19 2023-02-01 [1] CRAN (R 4.3.0) #> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0) #> colourpicker 1.2.0 2022-10-28 [1] CRAN (R 4.3.0) #> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0) #> crosstalk 1.2.0 2021-11-04 [1] CRAN (R 4.3.0) #> digest 0.6.31 2022-12-11 [1] CRAN (R 4.3.0) #> distributional 0.3.2 2023-03-22 [1] CRAN (R 4.3.0) #> dplyr * 1.1.2 2023-04-20 [1] CRAN (R 4.3.0) #> DT 0.27 2023-01-17 [1] CRAN (R 4.3.0) #> dygraphs 1.1.1.6 2018-07-11 [1] CRAN (R 4.3.0) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0) #> emmeans * 1.8.6 2023-05-11 [1] CRAN (R 4.3.0) #> estimability 1.4.1 2022-08-05 [1] CRAN (R 4.3.0) #> evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0) #> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0) #> farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.0) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.2 2023-04-25 [1] CRAN (R 4.3.0) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) #> ggplot2 3.4.2 2023-04-03 [1] CRAN (R 4.3.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) #> gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.0) #> gtable 0.3.3 2023-03-21 [1] CRAN (R 4.3.0) #> gtools 3.9.4 2022-11-27 [1] CRAN (R 4.3.0) #> htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.0) #> htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.0) #> httpuv 1.6.11 2023-05-11 [1] CRAN (R 4.3.0) #> igraph 1.4.2 2023-04-07 [1] CRAN (R 4.3.0) #> inline 0.3.19 2021-05-31 [1] CRAN (R 4.3.0) #> knitr 1.42 2023-01-25 [1] CRAN (R 4.3.0) #> later 1.3.1 2023-05-02 [1] CRAN (R 4.3.0) #> lattice 0.21-8 2023-04-05 [1] CRAN (R 4.3.0) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0) #> loo 2.6.0 2023-03-31 [1] CRAN (R 4.3.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> markdown 1.7 2023-05-16 [1] CRAN (R 4.3.0) #> Matrix 1.5-4 2023-04-04 [1] CRAN (R 4.3.0) #> matrixStats 0.63.0 2022-11-18 [1] CRAN (R 4.3.0) #> mime 0.12 2021-09-28 [1] CRAN (R 4.3.0) #> miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0) #> mvtnorm 1.1-3 2021-10-08 [1] CRAN (R 4.3.0) #> nlme 3.1-162 2023-01-31 [1] CRAN (R 4.3.0) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) #> pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> plyr 1.8.8 2022-11-11 [1] CRAN (R 4.3.0) #> posterior 1.4.1 2023-03-14 [1] CRAN (R 4.3.0) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.3.0) #> processx 3.8.1 2023-04-18 [1] CRAN (R 4.3.0) #> promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.3.0) #> ps 1.7.5 2023-04-18 [1] CRAN (R 4.3.0) #> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.3.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) #> Rcpp * 1.0.10 2023-01-22 [1] CRAN (R 4.3.0) #> RcppParallel 5.1.7 2023-02-27 [1] CRAN (R 4.3.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0) #> reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.0) #> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0) #> rmarkdown 2.21 2023-03-26 [1] CRAN (R 4.3.0) #> rstan 2.21.8 2023-01-17 [1] CRAN (R 4.3.0) #> rstantools 2.3.1 2023-03-30 [1] CRAN (R 4.3.0) #> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.3.0) #> scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> shiny 1.7.4 2022-12-15 [1] CRAN (R 4.3.0) #> shinyjs 2.1.0 2021-12-23 [1] CRAN (R 4.3.0) #> shinystan 2.6.0 2022-03-03 [1] CRAN (R 4.3.0) #> shinythemes 1.2.0 2021-01-25 [1] CRAN (R 4.3.0) #> StanHeaders 2.21.0-7 2020-12-17 [1] CRAN (R 4.3.0) #> stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0) #> stringr 1.5.0 2022-12-02 [1] CRAN (R 4.3.0) #> tensorA 0.36.2 2020-11-19 [1] CRAN (R 4.3.0) #> threejs 0.3.3 2020-01-21 [1] CRAN (R 4.3.0) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) #> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0) #> vctrs 0.6.2 2023-04-19 [1] CRAN (R 4.3.0) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0) #> xfun 0.39 2023-04-20 [1] CRAN (R 4.3.0) #> xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.0) #> xts 0.13.1 2023-04-16 [1] CRAN (R 4.3.0) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0) #> zoo 1.8-12 2023-04-13 [1] CRAN (R 4.3.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
rvlenth commented 1 year ago

Thanks for your suggestion. I think the best approach is probably to create an option, say "fac.comb.sep" that defaults to " ," and is used wherever we need to label factor combinations. Then the user can just change that option using emm_options(). This is in theory simple to do; however, finding all those places where I paste together labels like this is a bit tedious, and I am about to leave on a vacation. So it will be a couple of weeks before I actually address this.

billdenney commented 1 year ago

This is tangentially related to #209. While looking at the fix for this, does it make sense to add a more general option for combination and formatting of labels for contrasts, too? (This isn't a strong push-- I just needed the solution in #209 again today, and I saw this issue.)

rvlenth commented 1 year ago

That's my plan already.

rvlenth commented 1 year ago

Hmmm, well, this is a case of "my left hand doesn't know what my right hand is doing" -- or worse yet, "my right hand doesn't know what my right hand already did." It turns out that I had already implemented a "sep" option that works with contrast() and various plotting functions (see ? emm_options), so all I had to do is use it in as.mcmc(). (And to answer @billdenney 's comment, just demonstrate it.)

So I did that and here's a reprex:

library(emmeans)
library(coda)
set.seed (23.0619)

### Quick-and-dirty fake MCMC example
mod <- lm(breaks ~ wool * tension, data = warpbreaks)
rg <- regrid(ref_grid(mod), N.sim = 100)
## Simulating a sample of size 100 of regression coefficients.

### Default formatting ------------------------------------------------
pairs(rg)
##  contrast  estimate lower.HPD upper.HPD
##  A L - B L   16.750     7.149     28.89
##  A L - A M   20.386    11.331     30.05
##  A L - B M   15.617     3.811     25.40
##  A L - A H   19.895     8.882     27.95
##  A L - B H   25.484    15.736     35.65
##  B L - A M    3.611    -5.271     14.73
##  B L - B M   -1.378   -10.989      7.73
##  B L - A H    3.881    -6.523     11.85
##  B L - B H    8.656    -1.631     16.38
##  A M - B M   -5.661   -17.927      3.41
##  A M - A H   -0.163    -8.970      8.93
##  A M - B H    4.759    -5.120     15.93
##  B M - A H    4.525    -5.120     13.60
##  B M - B H    9.608     0.415     22.26
##  A H - B H    5.411    -4.622     17.11
## 
## Point estimate displayed: median 
## HPD interval probability: 0.95

colnames(as.mcmc(rg))
## [1] "wool A tension L" "wool B tension L" "wool A tension M" "wool B tension M"
## [5] "wool A tension H" "wool B tension H"

### Change sep to " & " ---------------------------------------------------
emm_options(sep = " & ")
pairs(rg)
##  contrast      estimate lower.HPD upper.HPD
##  A & L - B & L   16.750     7.149     28.89
##  A & L - A & M   20.386    11.331     30.05
##  A & L - B & M   15.617     3.811     25.40
##  A & L - A & H   19.895     8.882     27.95
##  A & L - B & H   25.484    15.736     35.65
##  B & L - A & M    3.611    -5.271     14.73
##  B & L - B & M   -1.378   -10.989      7.73
##  B & L - A & H    3.881    -6.523     11.85
##  B & L - B & H    8.656    -1.631     16.38
##  A & M - B & M   -5.661   -17.927      3.41
##  A & M - A & H   -0.163    -8.970      8.93
##  A & M - B & H    4.759    -5.120     15.93
##  B & M - A & H    4.525    -5.120     13.60
##  B & M - B & H    9.608     0.415     22.26
##  A & H - B & H    5.411    -4.622     17.11
## 
## Point estimate displayed: median 
## HPD interval probability: 0.95

colnames(as.mcmc(rg))
## [1] "wool A & tension L" "wool B & tension L" "wool A & tension M"
## [4] "wool B & tension M" "wool A & tension H" "wool B & tension H"

### Yet another sep -------------------------------------------------------
emm_options(sep = ":")
colnames(as.mcmc(rg, names = FALSE))
## [1] "A:L" "B:L" "A:M" "B:M" "A:H" "B:H"

Created on 2023-06-20 with reprex v2.0.2

This change will be included in the next CRAN update, which will be by the end of June 2023, I think

rvlenth commented 1 year ago

Note - a possibly unfortunate side-effect is that the default sep has changed from ", " to " ". This may mess-up any existing code that parses the results based on an expectation that ", " is used. I hope that there is no such existing code in another package.

rvlenth commented 1 year ago

I think this issue is resolved, so am closing it. Thanks for reporting this.

wlandau commented 1 year ago

Thanks @rvlenth!