insightsengineering / cards

CDISC Analysis Results Data
https://insightsengineering.github.io/cards/
24 stars 0 forks source link

Return more summary stats for the `ard_stack(by)` variable #219

Closed ddsjoberg closed 3 months ago

ddsjoberg commented 3 months ago
library(cards)
packageVersion("cards")
#> [1] '0.1.0.9008'

# the little n and p are missing for the overall tabulation of ARM
ard_stack(
  ADSL, 
  by = ARM, 
  .overall = TRUE,
  ard_categorical(variables = AGEGR1)
) |> 
  dplyr::filter(variable %in% "ARM")
#> {cards} data frame: 3 x 11
#>   group1 group1_level variable variable_level stat_name stat_label stat
#> 1   <NA>                   ARM        Placebo         N          N  254
#> 2   <NA>                   ARM      Xanomeli…         N          N  254
#> 3   <NA>                   ARM      Xanomeli…         N          N  254
#> ℹ 4 more variables: context, fmt_fn, warning, error

# we still get overall results when `.overall = FALSE`
ard_stack(
  ADSL, 
  by = ARM, 
  .overall = FALSE,
  ard_categorical(variables = AGEGR1)
) |> 
  dplyr::filter(variable %in% "ARM")
#> {cards} data frame: 3 x 11
#>   group1 group1_level variable variable_level stat_name stat_label stat
#> 1   <NA>                   ARM        Placebo         N          N  254
#> 2   <NA>                   ARM      Xanomeli…         N          N  254
#> 3   <NA>                   ARM      Xanomeli…         N          N  254
#> ℹ 4 more variables: context, fmt_fn, warning, error

Created on 2024-04-02 with reprex v2.1.0

ddsjoberg commented 3 months ago

FYI @statasaurus

ddsjoberg commented 3 months ago

Ahh, I mis-read the docs. Nothing wrong at all with the .overall argument. When there is a by argument AND .overall = TRUE, then we get the results re-calculated with by=NULL (essentially, we can add a new column call Overall to our table with all treatment groups combined. This is exactly what is happening, and what we want

I think the misunderstanding is regarding this chunk that executes when there is a by variable present. We want not just the total N, but the other summary stats for the by group as well. In the case of a univariate tabulation, N is the same for every level of the by variable.

ard_categorical(
        data = data,
        variables = all_of(by),
        statistic = everything() ~ categorical_summary_fns("N")
      )