insightsengineering / cards

CDISC Analysis Results Data
https://insightsengineering.github.io/cards/
24 stars 0 forks source link

Duplicated informations (analysis population header count) #200

Closed aassuied-ps closed 4 months ago

aassuied-ps commented 4 months ago

Feature description

CDISC ARD template provides the analysis population information (header) once (see FDA STF.xlsx on cdisc-org GitHub, on the AnalysisResults tab) :

image

{cards} provides the frequencies for all variables and variable levels. Here's an example :

# Libraries -----

# install.packages("devtools")
# remotes::install_github("insightsengineering/cards")

library(cards)
library(gtsummary)

# Table -----

## Analysis population -----
ADSLs <- cards::ADSL %>% dplyr::filter(SAFFL == "Y")

## Output -----
tbl <- ADSLs %>%
  gtsummary::tbl_summary(include = c(AGEGR1, DURDSGR1), by = ARM) %>%
  gtsummary::add_overall(last = TRUE)

print(tbl)

# ARD -----

## Without using cards::ard_stack() -----
ARD_1 <-
  cards::bind_ard(
    ard_categorical(ADSLs, by = "ARM", variables = c("AGEGR1", "DURDSGR1")),
    ard_categorical(ADSLs, variables = c("AGEGR1", "DURDSGR1"))
  )

pARD_1 <- ARD_1 %>% dplyr::filter(stat_label == "N") %>% dplyr::arrange(.cols = group1_level)
print(pARD_1)

## Using cards::ard_stack() -----
ARD_2 <-
  cards::ard_stack(
    data = ADSLs,
    by = ARM,
    .overall = TRUE,
    ard_categorical(variables = c("AGEGR1", "DURDSGR1"))
  )

pARD_2 <- ARD_2 %>% dplyr::filter(stat_label == "N") %>% dplyr::arrange(.cols = group1_level)
print(pARD_2)

Could we discuss about a way to have the information only once and grab it to construct the tables?

ddsjoberg commented 4 months ago

Hi @aassuied-ps ! Thanks for the post

The N is the number of non-missing observations for a specific variable. So if you don't have missing data, it will appear to repeat. However, within a variable we decided to repeat N for for each level of a categorical variable to make it very clear how percentages are calculated. In this simple case, the N is pretty obvious. But we can quickly get into more complex situations where the N would be less obvious (e.g. hierarchical/nested tabulations), and we've decided to make the denominators clear in all scenarios.

That said, if you don't want the N, you can modify the statistic argument to exclude it. Example below! Does that work for your needs?

library(cards)

ADSL |> 
  ard_categorical(
    variables = "AGEGR1",
    statistic = ~ categorical_summary_fns(c("n", "p"))
  )
#> {cards} data frame: 6 x 9
#>   variable variable_level   context stat_name stat_label  stat
#> 1   AGEGR1            <65 categori…         n          n    33
#> 2   AGEGR1            <65 categori…         p          %  0.13
#> 3   AGEGR1            >80 categori…         n          n    77
#> 4   AGEGR1            >80 categori…         p          % 0.303
#> 5   AGEGR1          65-80 categori…         n          n   144
#> 6   AGEGR1          65-80 categori…         p          % 0.567
#> ℹ 3 more variables: fmt_fn, warning, error

Created on 2024-02-27 with reprex v2.1.0

aassuied-ps commented 4 months ago

Thanks @ddsjoberg for the answer. Yes it does but I did not find a .overall argument in ard_categorical as with ard_stack so if we need the overall count it would give this:

ARD_4 <-
  cards::bind_ard(
    ard_categorical(
      ADSL, 
      by = "ARM", 
      variables = c("SAFFL"),
      statistic = everything() ~ categorical_summary_fns("N")
    ),
    ard_categorical(
      ADSL, 
      variables = c("SAFFL"),
      statistic = everything() ~ categorical_summary_fns("N")
    ),   
    ard_categorical(
      ADSL, 
      by = "ARM", 
      variables = c("AGEGR1", "DURDSGR1"),
      statistic = everything() ~ categorical_summary_fns(c("n", "p"))
    ),
    ard_categorical(
      ADSL, 
      variables = c("AGEGR1", "DURDSGR1"),
      statistic = everything() ~ categorical_summary_fns(c("n", "p"))
    )
  )

pARD_4 <- ARD_4 %>% dplyr::filter(stat_label == "N") %>% dplyr::arrange(.cols = group1_level)
print(pARD_4)

With the need to compute the the analysis twice (I do admit it's not often the case but for some specific milestones it's needed).

Using ard_stack I got this (overall count several time when requested):

ARD_3 <-
  cards::ard_stack(
    data = ADSL,
    by = ARM,
    .overall = TRUE,
    ard_categorical(
      variables = c("SAFFL"),
      statistic = everything() ~ categorical_summary_fns("N")
    ),
    ard_categorical(
      variables = c("AGEGR1", "DURDSGR1"),
      statistic = everything() ~ categorical_summary_fns(c("n", "p"))
    )
  )

pARD_3 <- ARD_3 %>% dplyr::filter(stat_label == "N") %>% dplyr::arrange(.cols = group1_level)
print(pARD_3)
ddsjoberg commented 4 months ago

Right, you'll see the N three times, and that is an intentional decision to make it 100% clear what the denominator is. Although you're seeing it three times, it was not calculated three times.

library(cards)

cards::ard_stack(
  data = ADSL,
  by = ARM,
  .overall = TRUE,
  ard_categorical(
    variables = c("SAFFL"),
    statistic = everything() ~ categorical_summary_fns("n")
  ),
  ard_categorical(
    variables = c("AGEGR1", "DURDSGR1"),
    statistic = everything() ~ categorical_summary_fns(c("n", "p"))
  )
) |> 
  dplyr::filter(stat_name == "N")
#> {cards} data frame: 3 x 11
#>   group1 group1_level variable variable_level stat_name stat_label stat
#> 1   <NA>                   ARM        Placebo         N          N  254
#> 2   <NA>                   ARM      Xanomeli…         N          N  254
#> 3   <NA>                   ARM      Xanomeli…         N          N  254
#> ℹ 4 more variables: context, fmt_fn, warning, error

Created on 2024-02-27 with reprex v2.1.0

ddsjoberg commented 4 months ago

Hi @aassuied-ps ! Did I answer your question? Do you want to jump on a call this week to chat about the details?

aassuied-ps commented 4 months ago

Hi @ddsjoberg, sorry for my late answer. I'm gonna look into it a little bit more and we can have a call after that if needed. Thanks a lot again for all your answers! : )

ddsjoberg commented 4 months ago

Thanks @aassuied-ps !