insightsengineering / tern

Table, Listings, and Graphs (TLG) library for common outputs used in clinical trials
https://insightsengineering.github.io/tern/
Other
73 stars 17 forks source link

[Bug]: add_rowcounts doesn't work if layout begins with non-population variable (eg AVISIT) #1060

Open anajens opened 10 months ago

anajens commented 10 months ago

What happened?

In the layout we first split by AVISIT (non-population dataset variable) and then ARM. When trying to add the row counts (N=xx) form alt_counts_df = ADSL there is an error because ADSL does not include AVISIT.

Not very nice workaround is to add a dummy AVISIT to ADSL and repeat the dataset as a many times as there are levels in AVISIT.

Noticed this when working on PKPT03.

library(rtables)
library(tern)
library(scda)
library(dplyr)

adsl <- synthetic_cdisc_dataset("latest", "adsl")
advs <- synthetic_cdisc_dataset("latest", "advs") %>%
  filter(AVISITN %in% c(0, 1)) %>%
  filter(PARAMCD %in% c("SYSBP", "DIABP"))

lyt <- basic_table() %>%
  split_cols_by_multivar(c("AVAL", "CHG")) %>%
  split_rows_by("AVISIT", split_fun = drop_split_levels) %>%
  split_rows_by("ARM", split_fun = drop_split_levels) %>%
  add_rowcounts(alt_counts = TRUE) %>%
  split_rows_by("PARAM", split_fun = drop_split_levels) %>%
  analyze_colvars(afun = mean, format = "xx.x")

# Error
build_table(lyt, advs, alt_counts_df = adsl)

Error

Error: Following error encountered in splitting alt_counts_df:  variable(s) [AVISIT] not present in data. (VarLevelSplit)
# not pretty workaround: dummy adsl with visit
adsl_visit <- rbind(adsl, adsl) %>%
  select(ARM) %>%
  mutate(
    AVISIT = factor(rep.int(c("BASELINE", "WEEK 1 DAY 8"), c(nrow(adsl), nrow(adsl))))
  )

build_table(lyt, advs, alt_counts_df = adsl_visit)

Desired output

BASELINE                                   
  A: Drug X (N=134)                        
    Diastolic Blood Pressure               
      mean                     96.5    0.0 
    Systolic Blood Pressure                
      mean                     151.7   0.0 
  B: Placebo (N=134)                       
    Diastolic Blood Pressure               
      mean                     101.1   0.0 
    Systolic Blood Pressure                
      mean                     149.5   0.0 
  C: Combination (N=132)                   
    Diastolic Blood Pressure               
      mean                     102.8   0.0 
    Systolic Blood Pressure                
      mean                     144.7   0.0 
WEEK 1 DAY 8                               
  A: Drug X (N=134)                        
    Diastolic Blood Pressure               
      mean                     100.6   4.1 
...

Example 2: With trim_levels_in_group split function

Set to NA a specific combination of the split vars - we want to keep this displayed in the table as missing

advs_miss <- advs %>%
  mutate(
    AVAL = if_else(
      AVISIT == "BASELINE" & ARM == "A: Drug X" & PARAMCD == "DIABP",
    NA, AVAL),
    CHG = if_else(
      AVISIT == "BASELINE" & ARM == "A: Drug X" & PARAMCD == "DIABP",
      NA, CHG)
  )

lyt_trim <- basic_table() %>%
  split_cols_by_multivar(c("AVAL", "CHG")) %>%
  split_rows_by("AVISIT", split_fun = drop_split_levels) %>%
  split_rows_by("ARM", split_fun = trim_levels_in_group("PARAMCD")) %>% ## change split fun here <------
  add_rowcounts(alt_counts = TRUE) %>%
  split_rows_by("PARAMCD") %>%
  analyze_colvars(afun = mean, format = "xx.x")

build_table(lyt_trim, advs_miss, alt_counts_df = adsl_visit)

GIves an error because PARAMCD is not it alt_counts_df:

Error: Following error encountered in splitting alt_counts_df: Error applying custom split function: no applicable method for 'droplevels' applied to an object of class "NULL"
    split: VarLevelSplit (ARM)
    occured at path: AVISIT[BASELINE]

Now we add PARAMCD to alt_counts_df to avoid the error:

adsl_avisit_param <- adsl_visit %>%
  mutate(PARAMCD = factor(NA_character_, levels = levels(advs$PARAMCD)))

build_table(lyt_trim, advs_miss, alt_counts_df = adsl_avisit_param)

And get the desired table:

                          AVAL    CHG 
———————————————————————————————————————
BASELINE                               
  A: Drug X (N=134)                    
    DIABP                              
      mean                  NA      NA 
    SYSBP                              
      mean                 151.7   0.0 
  B: Placebo (N=134)                   
    DIABP                              
      mean                 101.1   0.0 
    SYSBP                              
      mean                 149.5   0.0 
...
anajens commented 10 months ago

@Melkiades I added a second example to the issue.

tl;dr : alt_counts_df (ADSL) needs to be pre-processed to include potentially all variables from the row splits in the layout (depending on type of split functions used).

From rtables perspective this makes sense, it's just not very user friendly.

edelarua commented 8 months ago

Related to https://github.com/insightsengineering/tern/issues/535