rstudio / gt

Easily generate information-rich, publication-quality tables from R
https://gt.rstudio.com
Other
2.01k stars 204 forks source link

`cols_width()` not working when using `cols_merge()` for PDF output #1837

Open snhansen opened 1 month ago

snhansen commented 1 month ago

Description

Widths set with cols_width() isn't respected when using cols_merge() and when the output is PDF.

Reproducible example

Consider this Quarto document:

---
format: pdf
---

```{r}
#| echo: false
library(gt)
sp500 |>
  dplyr::slice(50:55) |>
  dplyr::select(-volume, -adj_close) |>
  gt() |>
  cols_align(
    columns = everything(),
    align = "right"
  ) |>
  cols_merge(
    columns = c(open, close),
    pattern = "{1}-{2}"
  ) |>
  cols_merge(
    columns = c(low, high),
    pattern = "{1}-{2}"
  ) |>
  cols_width(
    date ~ px(100),
    c("open", "low") ~ px(200)
  )
```

Expected result

The table created doesn't respect the widths set:

image

When output as html, everything looks fine:

image

Session info

> sessionInfo()
R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=Danish_Denmark.utf8     LC_CTYPE=Danish_Denmark.utf8       LC_MONETARY=Danish_Denmark.utf8    LC_NUMERIC=C                      
[5] LC_TIME=English_United States.1252

time zone: Europe/Copenhagen
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] marginaleffects_0.20.1 patchwork_1.2.0        glue_1.7.0             parameters_0.21.7      ggpomological_0.1.2    epoxy_1.0.0           
 [7] gt_0.11.0              lubridate_1.9.3        forcats_1.0.0          stringr_1.5.1          dplyr_1.1.4            purrr_1.0.2           
[13] readr_2.1.5            tidyr_1.3.1            tibble_3.2.1           ggplot2_3.5.1          tidyverse_2.0.0       

loaded via a namespace (and not attached):
 [1] sass_0.4.9         utf8_1.2.4         generics_0.1.3     xml2_1.3.6         lattice_0.22-6     stringi_1.8.4      hms_1.1.3          digest_0.6.35     
 [9] magrittr_2.0.3     evaluate_0.23      grid_4.4.0         timechange_0.3.0   estimability_1.5.1 mvtnorm_1.2-5      fastmap_1.2.0      processx_3.8.4    
[17] ps_1.7.6           fansi_1.0.6        scales_1.3.0       cli_3.6.2          rlang_1.1.4        reprex_2.1.0       munsell_0.5.1      commonmark_1.9.1  
[25] yaml_2.3.8         withr_3.0.0        tools_4.4.0        datawizard_0.11.0  tzdb_0.4.0         coda_0.19-4.1      colorspace_2.1-0   bayestestR_0.13.2 
[33] vctrs_0.6.5        R6_2.5.1           lifecycle_1.0.4    emmeans_1.10.2     fs_1.6.4           insight_0.20.0     callr_3.7.6        clipr_0.8.0       
[41] pkgconfig_2.0.3    pillar_1.9.0       gtable_0.3.5       Rcpp_1.0.12        data.table_1.15.4  xfun_0.44          tidyselect_1.2.1   knitr_1.47        
[49] rstudioapi_0.16.0  xtable_1.8-4       htmltools_0.5.8.1  rmarkdown_2.27     compiler_4.4.0     markdown_1.13  
nielsbock commented 1 month ago

Just want to add that cols_width is respected in pdf (but not html) output in the above example if "low" is replaced with "high" like this:

  cols_width(
    date ~ px(100),
    c("open", "high") ~ px(200)
  )

So it seems that the label is misattributed in latex. Maybe this can help with debugging.

I've had a related issue with latex when creating tables using gtsummary and converting to gt using as_gt(). Couldn't modify cols_width until I realised after using tab_info() that labels did not match column names in the table. Specifying column width eventually worked after trial and error with labels under tab_info().

snhansen commented 1 month ago

Good catch. I looked a bit more into it, and from what I can see the issue is with the create_table_start_l() function in utils_render_latex.R.

Indeed if we call the example table for ex_table, then we get the following

data <- gt:::build_data(data = ex_table, context = "latex")
colwidth_df <- gt:::create_colwidth_df_l(data = data)
colwidth_df
#>      type unspec lw  pt tbl_width
#> 1 default      0  0  75        NA
#> 2 default      0  0 150        NA
#> 3  hidden      1  0   0        NA
#> 4 default      0  0 150        NA
#> 5  hidden      1  0   0        NA

so it calculates the widths correctly for the correct columns, but this is translated into LaTeX code by create_table_start_l() incorrectly, i.e. it is missing a column specification.

gt:::create_table_start_l(data = data, colwidth_df = colwidth_df)
#> [1] "\\begin{longtable}{>{\\raggedleft\\arraybackslash}p{\\dimexpr 75.00pt -2\\tabcolsep-1.5\\arrayrulewidth}>{\\raggedleft\\arraybackslash}p{\\dimexpr 150.00pt -2\\tabcolsep-1.5\\arrayrulewidth}r}\n"
snhansen commented 1 month ago

Did a bit more investigating and it's this part of the code that's not working properly (line 179-214):

  if (any(colwidth_df$unspec < 1L)) {

    col_defs <- NULL

    for (i in seq_along(col_alignment)) {

      if (colwidth_df$unspec[i] == 1L) {
        col_defs_i <- substr(col_alignment[i], 1, 1)
      } else {

        align <-
          switch(
            col_alignment[i],
            left = ">{\\raggedright\\arraybackslash}",
            right = ">{\\raggedleft\\arraybackslash}",
            center = ">{\\centering\\arraybackslash}",
            ">{\\raggedright\\arraybackslash}"
          )

        col_defs_i <-
          paste0(
            align,
            "p{",
            create_singlecolumn_width_text_l(pt = colwidth_df$pt[i], lw = colwidth_df$lw[i]),
            "}"
          )

      }

      col_defs <- c(col_defs, col_defs_i)
    }

  } else {

    col_defs <- substr(col_alignment, 1, 1)
  }

because col_alignment only contains visible columns whereas colwidth_df contains both visible and invisible columns. A fix would be to get rid of the invisible columns of colwidth_df:

  if (any(colwidth_df$unspec < 1L)) {

    col_defs <- NULL
    colwidth_df_visible <- colwidth_df[colwidth_df$type != "hidden", ]

    for (i in seq_along(col_alignment)) {

      if (colwidth_df_visible$unspec[i] == 1L) {
        col_defs_i <- substr(col_alignment[i], 1, 1)
      } else {

        align <-
          switch(
            col_alignment[i],
            left = ">{\\raggedright\\arraybackslash}",
            right = ">{\\raggedleft\\arraybackslash}",
            center = ">{\\centering\\arraybackslash}",
            ">{\\raggedright\\arraybackslash}"
          )

        col_defs_i <-
          paste0(
            align,
            "p{",
            create_singlecolumn_width_text_l(pt = colwidth_df_visible$pt[i], lw = colwidth_df_visible$lw[i]),
            "}"
          )

      }

      col_defs <- c(col_defs, col_defs_i)
    }

  } else {

    col_defs <- substr(col_alignment, 1, 1)
  }

EDIT: The above doesn't work with stubs/row_groups introduced with the rowname_col and groupname_col options:

data <- mtcars[1:4, c("am", "gear", "mpg", "cyl", "disp")] |>
  gt(rowname_col = "gear", groupname_col = "am") |>
  cols_merge(
    columns = c("mpg", "cyl")
  ) |>
  cols_width(
    "gear" ~ px(50),
    c("mpg", "disp") ~ px(150)
  )

yields

> colwidth_df
       type unspec lw    pt tbl_width
1 row_group      1  0   0.0        NA
2      stub      0  0  37.5        NA
3   default      0  0 112.5        NA
4    hidden      1  0   0.0        NA
5   default      0  0 112.5        NA
olivroy commented 3 weeks ago

Thanks both for your investigation!

Basically, if you use cols_merge(c(col1, col2)), under the hood, gt does the merging, and keeps col1 as the new column and calls cols_hide() on col2.

We'd happily accept a PR for this with tests, clear explanations, before and after screenshots of the result

snhansen commented 3 weeks ago

@olivroy: I'm taking a look at this and currently trying to understand stub layouts. Could you give me an example where the result of get_stub_layout() has length 2? I can't think of such an example, but from the code it seems possible, and I'd like to handle all cases properly.

olivroy commented 3 weeks ago

get_stub_layout() returns length 2 if you have both row_group_as_column = TRUE and the data has row names