insightsengineering / cards

CDISC Analysis Results Data
https://insightsengineering.github.io/cards/
24 stars 0 forks source link

Populate "overall" categories for `ard_stack(..., .overall = TRUE, .shuffle = TRUE)` #234

Open bzkrouse opened 2 months ago

bzkrouse commented 2 months ago

We have some code in shuffle_ard() to populate levels with "Overall < varname >" if a grouping variable is present but its level is NA (like with for overall stat tests). It would be nice for this to happen for overall calculations from ard_stack() as well, to effectively create a separate "overall" category for displays. This requires adding just a bit of code in ard_stack() prior to the shuffle.

example:

ard_stack(
  data = adsl_,
  by = TRT01A,
  ard_continuous(variables = AGE),
  ard_categorical(variables = AGEGR1),
  .shuffle = TRUE,
  .overall = TRUE
)  

Current rows corresponding to the overall calculations:

# A tibble: 17 × 6
   TRT01A variable label           context     stat_name    stat
   <chr>  <chr>    <chr>           <chr>       <chr>       <dbl>
 1 NA     AGE      N               continuous  N         254    
 2 NA     AGE      Mean            continuous  mean       75.1  
 3 NA     AGE      SD              continuous  sd          8.25 
 4 NA     AGE      Median          continuous  median     77    
 5 NA     AGE      25th Percentile continuous  p25        70    
 6 NA     AGE      75th Percentile continuous  p75        81    
 7 NA     AGE      Min             continuous  min        51    
 8 NA     AGE      Max             continuous  max        89    
 9 NA     AGEGR1   <65             categorical n          33    
10 NA     AGEGR1   <65             categorical N         254    

Desired result:

# A tibble: 26 × 6
   TRT01A         variable label           context     stat_name   stat
   <chr>          <chr>    <chr>           <chr>       <chr>      <dbl>
 1 Overall TRT01A AGE      N               continuous  N         254   
 2 Overall TRT01A AGE      Mean            continuous  mean       75.1 
 3 Overall TRT01A AGE      SD              continuous  sd          8.25
 4 Overall TRT01A AGE      Median          continuous  median     77   
 5 Overall TRT01A AGE      25th Percentile continuous  p25        70   
 6 Overall TRT01A AGE      75th Percentile continuous  p75        81   
 7 Overall TRT01A AGE      Min             continuous  min        51   
 8 Overall TRT01A AGE      Max             continuous  max        89   
 9 Overall TRT01A AGEGR1   <65             categorical n          33   
10 Overall TRT01A AGEGR1   <65             categorical N         254   

@ddsjoberg I will put in a PR to get your thoughts!

ddsjoberg commented 1 month ago

From our meeting on May 20, 2024: Something like this could work for populating an Overall grouping level from within ard_shuffle()

library(cards)

tt <- bind_ard(
  ard_categorical(ADSL, by = ARM, variables = AGEGR1),
  ard_categorical(ADSL, variables = AGEGR1),
  ard_categorical(ADSL, variables = ARM)
)

tt_missing_by <-
  tt |> 
  dplyr::filter(is.na(group1)) |> 
  dplyr::rows_update(
    tt |> 
      dplyr::filter(!is.na(group1)) |> 
      dplyr::select(all_ard_variables(), "stat_name", "stat_label") |> 
      dplyr::distinct() |> 
      dplyr::mutate(group1 = "ARM", group1_level = list("Overall"), .before = 1L),
    by = c("variable", "variable_level", "stat_name", "stat_label"),
    unmatched = "ignore"
  )
tt_missing_by
tt[is.na(tt$group1), names(tt_missing_by)] <- tt_missing_by
tt