Bug Report: Confidence interval in tidy_survfit() is in the wrong order when type = "risk"

MichaelDismorrMD commented 1 year ago

Hi, I noticed that the output of tidy_survfit(type = "risk") labels the lower confidence interval as conf.high, and the upper confidence interval as conf.low. Please see example code below.

Kind regards Michael Dismorr

library(ggsurvfit) 

library(dplyr)

survfit2(Surv(time, status) ~ factor(ph.ecog), data = df_lung) %>% 
  tidy_survfit(type = "risk") %>% 
  glimpse()
#> Rows: 220
#> Columns: 16
#> $ time                <dbl> 0.0000000, 0.1642710, 0.3613963, 0.4928131, 1.0184…
#> $ n.risk              <dbl> 63, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52…
#> $ n.event             <dbl> 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0,…
#> $ n.censor            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1,…
#> $ cum.event           <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 10, 11, 11, 12, 1…
#> $ cum.censor          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 4,…
#> $ estimate            <dbl> 0.00000000, 0.01587302, 0.03174603, 0.04761905, 0.…
#> $ std.error           <dbl> 0.00000000, 0.01600051, 0.02281286, 0.02817181, 0.…
#> $ conf.high           <dbl> 0.000000000, 0.000000000, 0.000000000, 0.000000000…
#> $ conf.low            <dbl> 0.00000000, 0.04625675, 0.07408538, 0.09878001, 0.…
#> $ strata              <fct> Asymptomatic, Asymptomatic, Asymptomatic, Asymptom…
#> $ estimate_type       <chr> "risk", "risk", "risk", "risk", "risk", "risk", "r…
#> $ estimate_type_label <chr> "Risk", "Risk", "Risk", "Risk", "Risk", "Risk", "R…
#> $ monotonicity_type   <chr> "increasing", "increasing", "increasing", "increas…
#> $ strata_label        <chr> "factor(ph.ecog)", "factor(ph.ecog)", "factor(ph.e…
#> $ conf.level          <dbl> 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.…

^{Created on 2023-06-08 with reprex v2.0.2}

ddsjoberg commented 1 year ago

Thank you for the post @MichaelDismorrMD . I was aware of this issue, but didn't bother correcting because the package is focused on resulting figures (and does not matter for the figures) and this function is primarily used for internal pkg purposes.

But I do export it, so it should be made accurate. I'll take a look how much restructuring it would take

ddsjoberg commented 1 year ago

The tricky bit being that the type argument accepts any function, so it can be tricky to know if the UB and LB switch

MichaelDismorrMD commented 1 year ago

Thank you for exporting it, it's a really convenient function. Since the function documentation states it must be one of survival, risk or cumhaz, the switch could be hard-coded in any of those cases, perhaps by setting an argument switch.ci = T.

Then for any user supplied functions, let the user specify if the switch is needed (since it is not possible to incorporate into .transfun()....

Perhaps something like this would work at line 188 # Transform estimates------:

if(switch.ci = T) {
  cols <- c(conf.high = "conf.low", conf.low = "conf.high") # Using a named vector for names after mutate
  x <-
    x %>%
    dplyr::mutate(
      dplyr::across(dplyr::any_of(c("estimate"), .transfun)
                    ) %>% 
    dplyr::mutate(
      dplyr::across(dplyr::any_of(c(cols), .transfun), .keep = "unused" # Remove the old conf.low and conf.high
    )
} else {
x <-
  x %>%
  dplyr::mutate(
    dplyr::across(dplyr::any_of(c("estimate", "conf.low", "conf.high")), .transfun)
  )
}

pharmaverse / ggsurvfit

Bug Report: Confidence interval in tidy_survfit() is in the wrong order when type = "risk" #154