insightsengineering / cards

CDISC Analysis Results Data
https://insightsengineering.github.io/cards/
30 stars 2 forks source link

Function to fill missing calculated statistics? #185

Closed ddsjoberg closed 7 months ago

ddsjoberg commented 7 months ago

There is some inconsistency on what is returned when invalid inputs are passed to base R summary functions. In the example below, we're passing a vector of all NA of class character. Functions like mean() and sd() return an NA (mean() with a warning), but quantile() returns an error because the input is not numeric.

data.frame(x = rep_len(NA_character_, 10)) |> 
  cards::ard_continuous(variables = x)
#> {cards} data frame: 8 x 8
#>   variable stat_name stat_label statistic   warning     error
#> 1        x         N          N         0                    
#> 2        x      mean       Mean        NA argument…          
#> 3        x        sd         SD        NA                    
#> 4        x    median     Median        NA                    
#> 5        x       p25  25th Per…                     non-nume…
#> 6        x       p75  75th Per…                     non-nume…
#> 7        x       min        Min        NA                    
#> 8        x       max        Max        NA
#> ℹ 2 more variables: context, statistic_fmt_fn

Created on 2024-02-17 with reprex v2.1.0

To keep things consistent for downstream processing, it could be helpful to have a function that fills NULL values with NA, for example.

Perhaps something like this:

replace_null <- function(x, rows = TRUE, value = NA) {
  cards:::check_class(x, "card")

  x |>
    dplyr::rowwise() |> 
    dplyr::mutate(
      # styler: off
      statistic = 
        if (is.null(.data$statistic) && {{ rows }}) list(.env$value)
        else list(.data$statistic)
      # styler: on
    ) |> 
    dplyr::group_by(dplyr::pick(dplyr::group_vars(x))) |> 
    structure(class = class(x))
}

data.frame(x = rep_len(NA_character_, 10)) |> 
  cards::ard_continuous(variables = x) |> 
  replace_null(rows = !is.null(error))
#> {cards} data frame: 8 x 8
#>   variable stat_name stat_label statistic   warning     error
#> 1        x         N          N         0                    
#> 2        x      mean       Mean        NA argument…          
#> 3        x        sd         SD        NA                    
#> 4        x    median     Median        NA                    
#> 5        x       p25  25th Per…        NA           non-nume…
#> 6        x       p75  75th Per…        NA           non-nume…
#> 7        x       min        Min        NA                    
#> 8        x       max        Max        NA
#> ℹ 2 more variables: context, statistic_fmt_fn

Created on 2024-02-17 with reprex v2.1.0