insightsengineering / rtables

Reporting tables with R
https://insightsengineering.github.io/rtables/
Other
222 stars 48 forks source link

[Bug]: Accessing columns by path doesn't work when table has an overall column. #908

Open hafen opened 1 month ago

hafen commented 1 month ago

Using the example found here (minus the secondary column splitting split_cols_by("SEX")):

.simpleCap <- function(x) {
  if (length(x) > 1) {
    return(sapply(x, .simpleCap))
  }
  s <- strsplit(tolower(x), " ")[[1]]
  paste(toupper(substring(s, 1, 1)), substring(s, 2), sep = "", collapse = " ")
}

adsl2 <- ex_adsl |>
  filter(SEX %in% c("M", "F") & RACE %in% (levels(RACE)[1:3])) |>
  mutate(ethnicity = .simpleCap(gsub("(.*)OR.*", "\\1", RACE)), RACE = factor(RACE))

lyt2 <- basic_table() |>
  split_cols_by("ARM") |>
  split_rows_by("RACE", labels_var = "ethnicity", split_fun = drop_split_levels) |>
  summarize_row_groups() |>
  analyze(c("AGE", "STRATA1"))

tbl2 <- build_table(lyt2, adsl2)

tbl2
#             A: Drug X    B: Placebo   C: Combination
# ————————————————————————————————————————————————————
# Asian       66 (54.1%)   66 (55.5%)     71 (59.2%)  
#   AGE                                               
#     Mean      32.50        36.68          36.99     
#   STRATA1                                           
#     A           21           24             18      
#     B           20           22             25      
#     C           25           20             28      
# Black       30 (24.6%)   28 (23.5%)     28 (23.3%)  
#   AGE                                               
#     Mean      34.27        34.93          33.71     
#   STRATA1                                           
#     A           7            11             10      
#     B           11           7              8       
#     C           12           10             10      
# White       26 (21.3%)   25 (21.0%)     21 (17.5%)  
#   AGE                                               
#     Mean      36.15        33.12          31.95     
#   STRATA1                                           
#     A           8            6              8       
#     B           9            12             7       
#     C           9            7              6 

Adding a column footnote works for this, for example:

col_paths_summary(tbl2)
# label             path               
# —————————————————————————————————————
# A: Drug X         ARM, A: Drug X     
# B: Placebo        ARM, B: Placebo    
# C: Combination    ARM, C: Combination
fnotes_at_path(tbl2, rowpath = NULL, c("ARM", "B: Placebo")) <- c("this is a placebo")
tbl2
#             A: Drug X    B: Placebo {1}   C: Combination
# ————————————————————————————————————————————————————————
# Asian       66 (54.1%)     66 (55.5%)       71 (59.2%)  
#   AGE                                                   
#     Mean      32.50          36.68            36.99     
# ...
#     C           9              7                6       
# ————————————————————————————————————————————————————————

# {1} - this is a placebo
# ————————————————————————————————————————————————————————

However, if we add an overall column, this no longer works:

lyt3 <- lyt2 |> add_overall_col("Total")
tbl3 <- build_table(lyt3, adsl2)
tbl3
#             A: Drug X    B: Placebo   C: Combination      Total   
# ——————————————————————————————————————————————————————————————————
# Asian       66 (54.1%)   66 (55.5%)     71 (59.2%)     203 (56.2%)
#   AGE                                                             
#     Mean      32.50        36.68          36.99           35.43   
# ...
#     C           9            7              6              22  
col_paths_summary(tbl3)
# label             path               
# —————————————————————————————————————
# A: Drug X         ARM, A: Drug X     
# B: Placebo        ARM, B: Placebo    
# C: Combination    ARM, C: Combination
# Total             Total, Total
fnotes_at_path(tbl3, rowpath = NULL, c("ARM", "B: Placebo")) <- c("this is a placebo")
# Error in col_fnotes_at_path(coltree(ttrp), colpath, fnotes = value) : 
#   Path appears invalid at step: ARM

Also, possibly related, note that the actual example found here has its footnote numbered as NA in the output found in the article.

I encounter this using rtables 0.6.9 installed from CRAN as well as v0.6.9.9005 installed from GitHub.

Melkiades commented 1 month ago

It seems to be a legitimate bug. @gmbecker, could you take a look? It may be coming from the colcounts update

gmbecker commented 1 month ago

@Melkiades This isn't coming from the colcounts method, its been there for a long time (see end of message, those versions of rtables and formatters are cran versions from 2023).

What is happening has to do with the structure generated by adding an overall column that way inserts a root table that the arm substructure and overall substructure are siblings within:

> coltree_structure(tbl)
[root] (no pos)
   [ARM] (no pos)
     [A: Drug X] (ARM: A: Drug X)
     [B: Placebo] (ARM: B: Placebo)
     [C: Combination] (ARM: C: Combination)
   [overall] (no pos)
     [overall] (overall: overall)

Also (this is back to using the cran versions), we do have working pathing because the brackets work, so the issue is mroe specifically with the column footnote machinery.

> tbl[, c("ARM", "B: Placebo")]
Note: method with signature ‘VTableTree#missing#ANY’ chosen for function ‘[’,
 target signature ‘TableTree#missing#character’.
 "VTableTree#ANY#character" would also be valid
            B: Placebo
——————————————————————
Asian       66 (55.5%)
  AGE                 
    Mean      36.68   
  STRATA1             
    A           24    
    B           22    
    C           20    
Black       28 (23.5%)
  AGE                 
    Mean      34.93   
  STRATA1             
    A           11    
    B           7     
    C           10    
White       25 (21.0%)
  AGE                 
    Mean      33.12   
  STRATA1             
    A           6     
    B           12    
    C           7     
> tbl[, c("overall", "overall")]
              overall  
———————————————————————
Asian       203 (56.2%)
  AGE                  
    Mean       35.43   
  STRATA1              
    A           63     
    B           67     
    C           73     
Black       86 (23.8%) 
  AGE                  
    Mean       34.30   
  STRATA1              
    A           28     
    B           26     
    C           32     
White       72 (19.9%) 
  AGE                  
    Mean       33.88   
  STRATA1              
    A           22     
    B           28     
    C           22     

note the more technically correct paths also work within the pathing machinery that the bracket is hitting:

> tbl[, c("root", "overall", "overall")]
              overall  
———————————————————————
Asian       203 (56.2%)
  AGE                  
    Mean       35.43   
  STRATA1              
    A           63     
    B           67     
    C           73     
Black       86 (23.8%) 
  AGE                  
    Mean       34.30   
  STRATA1              
    A           28     
    B           26     
    C           32     
White       72 (19.9%) 
  AGE                  
    Mean       33.88   
  STRATA1              
    A           22     
    B           28     
    C           22     

The culprit looks to be col_fnotes_at_path which has custom pathing machinery which looks like it never successfully handled the nested=FALSE column splitting case correctly.

Sesssion info for error

> library(formatters)
> library(rtables)
> library(dplyr)
> .simpleCap <- function(x) {
+ if (length(x) > 1) {
+ return(sapply(x, .simpleCap))
+ }
+ s <- strsplit(tolower(x), " ")[[1]]
+ paste(toupper(substring(s, 1, 1)), substring(s, 2), sep = "", collapse = " ")
+ }
> adsl2 <- ex_adsl |>
+ filter(SEX %in% c("M", "F") & RACE %in% (levels(RACE)[1:3])) |>
+ mutate(ethnicity = .simpleCap(gsub("(.*)OR.*", "\\1", RACE)), RACE = factor(RACE))
> lyt2 <- basic_table() |>
+ split_cols_by("ARM") |>
+ split_rows_by("RACE", labels_var = "ethnicity", split_fun = drop_split_levels) |>
+ summarize_row_groups() |>
+ analyze(c("AGE", "STRATA1"))
> lyt3 <- lyt2 |> add_overall_col("overall")
> tbl <- build_table(lyt3, adsl2)
> fnotes_at_path(tbl, rowpath = NULL, c("ARM", "B: Placebo")) <- c("this is a placebo")
Error in col_fnotes_at_path(coltree(ttrp), colpath, fnotes = value) : 
  Path appears invalid at step: ARM
> sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.1.4      rtables_0.6.3    magrittr_2.0.3   formatters_0.5.2

loaded via a namespace (and not attached):
 [1] digest_0.6.34     backports_1.4.1   utf8_1.2.4        R6_2.5.1          fastmap_1.1.1     tidyselect_1.2.0 
 [7] glue_1.7.0        tibble_3.2.1      pkgconfig_2.0.3   htmltools_0.5.8.1 generics_0.1.3    lifecycle_1.0.4  
[13] cli_3.6.2         fansi_1.0.6       grid_4.3.3        vctrs_0.6.5       compiler_4.3.3    tools_4.3.3      
[19] checkmate_2.3.1   pillar_1.9.0      rlang_1.1.3      
gmbecker commented 1 month ago

@hafen this is a legitimate bug. In the meantime, the workaround is to use add_overall_level as a split function (or more accurately a split function factory):

lyt4 <- basic_table() |>
    split_cols_by("ARM", split_fun = add_overall_level("overall", first = FALSE)) |>
  split_rows_by("RACE", labels_var = "ethnicity", split_fun = drop_split_levels) |>
  summarize_row_groups() |>
    analyze(c("AGE", "STRATA1"))
> tbl2 <- build_table(lyt4, adsl2)
> fnotes_at_path(tbl2, rowpath = NULL, c("ARM", "B: Placebo")) <- c("this is a placebo")
> tbl2
            A: Drug X    B: Placebo {1}   C: Combination     overall  
——————————————————————————————————————————————————————————————————————
Asian       66 (54.1%)     66 (55.5%)       71 (59.2%)     203 (56.2%)
  AGE                                                                 
    Mean      32.50          36.68            36.99           35.43   
  STRATA1                                                             
    A           21             24               18             63     
    B           20             22               25             67     
    C           25             20               28             73     
Black       30 (24.6%)     28 (23.5%)       28 (23.3%)     86 (23.8%) 
  AGE                                                                 
    Mean      34.27          34.93            33.71           34.30   
  STRATA1                                                             
    A           7              11               10             28     
    B           11             7                8              26     
    C           12             10               10             32     
White       26 (21.3%)     25 (21.0%)       21 (17.5%)     72 (19.9%) 
  AGE                                                                 
    Mean      36.15          33.12            31.95           33.88   
  STRATA1                                                             
    A           8              6                8              22     
    B           9              12               7              28     
    C           9              7                6              22     
——————————————————————————————————————————————————————————————————————

{1} - this is a placebo
——————————————————————————————————————————————————————————————————————

The difference here is that we are adding an "overall" facet that is a direct sibling to the arm facets, rather than one that derives from a separate top-level branch like add_overall_col does.

> coltree_structure(tbl2)
[ARM] (no pos)
   [A: Drug X] (ARM: A: Drug X)
   [B: Placebo] (ARM: B: Placebo)
   [C: Combination] (ARM: C: Combination)
   [overall] (ARM: overall)