insightsengineering / cards

CDISC Analysis Results Data
https://insightsengineering.github.io/cards/
24 stars 0 forks source link

Make it easier to get *total* population N ? #236

Closed bzkrouse closed 3 weeks ago

bzkrouse commented 2 months ago

This can definitely be inferred from the ard_* results but requires post processing that users will have to do each time. For example, ard_categorical(ADSL, variables = TRTA) gives treatment group-level n's and population Ns. For downstream use, they would need to either manipulate the rows with Ns (keeping just 1 row and renaming it to be clear it is a total N), or do some other manipulation.

In ard_stack() with .overall = TRUE we get population-level stats for all the variables, so this would go well with those rows.

ddsjoberg commented 2 months ago

We discussed a while back that the total Ns are returned in ard_missing(). But one of these integrations could make more sense in some situations. Just a few thoughts from a chat with @bzkrouse

library(cards)

# good, but don't love that for the majority of situations, user would probably expect TRT = Overall
# but that paradigm falls apart with more than one by variable
ADSL |> 
  dplyr::mutate(..overall.. =  "Overall") |> 
  ard_categorical(
    ..overall..,
    statistic = everything() ~ categorical_summary_fns("N")
  )
#> {cards} data frame: 1 x 9
#>      variable variable_level   context stat_name stat_label stat
#> 1 ..overall..        Overall categori…         N          N  254
#> ℹ 3 more variables: fmt_fn, warning, error

# works well, except if there is more than one by variable
ADSL |> 
  ard_missing(
    ARM,
    statistic = everything() ~ missing_summary_fns("N_obs")
  ) |> 
  dplyr::mutate(
    context = "categorical",
    stat_name = "N",
    variable_level = "<< Overall >>"
  ) |>
  tidy_ard_column_order() 
#> {cards} data frame: 1 x 9
#>   variable variable_level   context stat_name stat_label stat
#> 1      ARM      << Overa… categori…         N  Vector L…  254
#> ℹ 3 more variables: fmt_fn, warning, error

# this also works, but repeats calculations (but also not that big of a deal?)
by <- c("ARM", "AGEGR1")
ADSL |> 
  dplyr::mutate(dplyr::across(all_of(by), ~"Overall")) |> 
  ard_categorical(
    all_of(by),
    statistic = everything() ~ categorical_summary_fns("N")
  )
#> {cards} data frame: 2 x 9
#>   variable variable_level   context stat_name stat_label stat
#> 1      ARM        Overall categori…         N          N  254
#> 2   AGEGR1        Overall categori…         N          N  254
#> ℹ 3 more variables: fmt_fn, warning, error

Created on 2024-04-29 with reprex v2.1.0

ddsjoberg commented 1 month ago

I just had a thought for an easy way to get the total N using the ard_attributes() function. We could just return a row with N as an attribute of the data frame.

ddsjoberg commented 1 month ago

Let's add another function to return this information

ard_total_n(), which would return something like:

mtcars |> 
  dplyr::mutate(..total_n.. = 1L) |> 
  cards::ard_categorical(variables = ..total_n.., statistic = ~cards::categorical_summary_fns("N")) |> 
  dplyr::select(-"variable_level")
#> {cards} data frame: 1 x 8
#>      variable   context stat_name stat_label stat fmt_fn
#> 1 ..total_n.. categori…         N          N   32      0
#> ℹ 2 more variables: warning, error

Created on 2024-05-20 with reprex v2.1.0

We also need to decide hoq this is called from ard_stack().