pharmaverse / ggsurvfit

http://www.danieldsjoberg.com/ggsurvfit/
Other
74 stars 20 forks source link

Feature request: Add option to round risktable for weighted data #90

Closed lkowarski2 closed 2 years ago

lkowarski2 commented 2 years ago

There does not seem to be an option to round the add_risktable output. This is problematic for weighted analyses, as the default seems to allow 16 characters for each value in the add_risktable output, which decreases legibility and leads to overlapping numbers.

Can an option to round the add_risktable output be added? If an existing option is available, can it be added to the documentation?

ddsjoberg commented 2 years ago

that is a reasonable request, thank you for the post!

can you please add a minimal reproducible example here of the weighted analysis (ie code i can run on my machine) and I can work from there?

lkowarski2 commented 2 years ago

Thank you so much for your quick reply! I really like the aesthetics of your package, so I hope to be able to use it with this adjustment! The data that I'm using cannot be shared, but I created an example using the adtte pilot data


#Purpose: Demonstrate issue with being unable to round at risk tables in ggsurvfit when using weighted data
#Input data = CDISC pilot adtte dataset

###########Section 1: Load packages and set up input data for analysis
library(ggplot2)
library(survival)
library(visR)
library(ggsurvfit)

# load ADTTE from CDISC pilot 
data(adtte)
view(adtte)
nrow(adtte)

# Create random weight variable to reproduce example
set.seed(1234)
sipw <- as.data.frame(matrix(runif(254, min=0.5, max=1.5)))
names(sipw) <- c('SIPW')
sipw

#Add random weight variable to ADTTE CDISC pilot data
adtte_exwgt <- data.frame(adtte, sipw)
#recode cnsr to event
adtte_exwgt$EVNT <- ifelse(adtte_exwgt$CNSR==0, 1, 0)

#Check input data for example
colnames(adtte_exwgt)
summary(adtte_exwgt$SIPW)

###########Section 2: Demonstrate issue with default at risk table rounding (or lack of rounding)
##Reproduce error
survfit<-survfit(Surv(AVAL, EVNT) ~ TRTP, weight=SIPW, data=adtte_exwgt, 
                 id=USUBJID, cluster=USUBJID, robust=TRUE)
survfit$.Environment <- rlang::current_env()
class(survfit) <- c("survfit2", "survfit")
str(survfit)

ggsurvfit(survfit, linetype_aes = TRUE, type="risk") +
  add_confidence_interval()+
  add_risktable(times=seq(1, 200, by=30))

##No weight, no problem
survfit<-survfit(Surv(AVAL, EVNT) ~ TRTP, data=adtte_exwgt, 
                 id=USUBJID, cluster=USUBJID, robust=TRUE)
survfit$.Environment <- rlang::current_env()
class(survfit) <- c("survfit2", "survfit")
str(survfit)

ggsurvfit(survfit, linetype_aes = TRUE, type="risk") +
  add_confidence_interval()+
  add_risktable(times=seq(1, 200, by=30))
ddsjoberg commented 2 years ago

Somewhat smaller example

library(ggsurvfit)
#> Loading required package: ggplot2

set.seed(1234)
adtte_exwgt <- visR::adtte
adtte_exwgt$SIPW <- runif(nrow(visR::adtte), min=0.5, max=1.5)

# FIRST ISSUE: `survfit2()` does not allow the `weights=` argument! WHAT!?!
survfit2(
  formula = Surv_CNSR() ~ TRTP, 
  weight = SIPW,
  data = adtte_exwgt, 
  id = USUBJID, 
  cluster = USUBJID, 
  robust = TRUE
)
#> Error in eval(extras, data, env): ..1 used in an incorrect context, no ... to look in

# SECOND ISSUE: The Ns from `broom::tidy()` have no way of being formatted
survfit(
  formula = Surv_CNSR() ~ TRTP, 
  weight = SIPW,
  data = adtte_exwgt, 
  id = USUBJID, 
  cluster = USUBJID, 
  robust = TRUE
) |> 
  broom::tidy()
#> # A tibble: 161 × 9
#>     time n.risk n.event n.censor estimate std.error conf.high conf.low strata   
#>    <dbl>  <dbl>   <dbl>    <dbl>    <dbl>     <dbl>     <dbl>    <dbl> <chr>    
#>  1     1   84.0   0.807    0        0.990   0.00957     1        0.972 TRTP=Pla…
#>  2     2   83.2   0.614    0        0.983   0.0120      1        0.960 TRTP=Pla…
#>  3     3   82.6   1.75     0        0.962   0.0192      1        0.925 TRTP=Pla…
#>  4     7   80.9   1.31     0        0.947   0.0243      0.996    0.900 TRTP=Pla…
#>  5     8   79.6   0        1.40     0.947   0.0243      0.996    0.900 TRTP=Pla…
#>  6     9   78.2   1.10     0        0.933   0.0273      0.989    0.881 TRTP=Pla…
#>  7    12   77.1   0        0.744    0.933   0.0273      0.989    0.881 TRTP=Pla…
#>  8    13   76.3   0        0.933    0.933   0.0273      0.989    0.881 TRTP=Pla…
#>  9    14   75.4   0        0.681    0.933   0.0273      0.989    0.881 TRTP=Pla…
#> 10    16   74.7   0.780    0        0.924   0.0288      0.982    0.869 TRTP=Pla…
#> # … with 151 more rows

# which in turn, results in a wild risktable
survfit(
  formula = Surv_CNSR() ~ TRTP, 
  weight = SIPW,
  data = adtte_exwgt, 
  id = USUBJID, 
  cluster = USUBJID, 
  robust = TRUE
) |> 
  ggsurvfit() +
  add_risktable()
#> Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
#> Please use `any_of()` or `all_of()` instead.

Created on 2022-09-21 with reprex v2.0.2

lkowarski2 commented 2 years ago

Yes, sorry I neglected to mention the issue with survfit2 because I was able to get around it, but survfit2 also didn't work for me with the unweighted analysis. I was able to reproduce the survival object it creates using the below code and using survfit as the first parameter in ggsurvfit though to achieve the same aesthetic.

survfit<-survfit(Surv(AVAL, EVNT) ~ TRTP, weight=SIPW, data=adtte_exwgt, 
                  id=USUBJID, cluster=USUBJID, robust=TRUE)
survfit$.Environment <- rlang::current_env()
class(survfit) <- c("survfit2", "survfit")
str(survfit)

This was the result from my example w/ the weighted analysis. image

ddsjoberg commented 2 years ago

Thanks @lkowarski2 !

I made a fix so that survfit2() will work with the non-standard evaluation inputs (just like survivial::survfit().

Now, do you think we need an option to specify a formatting function for the Ns in the risk table, or can we just round to the nearest integer?

lkowarski2 commented 2 years ago

Thank you @ddsjoberg! I think that rounding to the nearest integer would work!

ddsjoberg commented 2 years ago

@lkowarski2 thanks for posting this issue!

updates are now in the main branch on GH, and we expect to make a release to CRAN in mid to late October

lkowarski2 commented 2 years ago

Thank you @ddsjoberg for so quickly incorporating my request! I look forward to the CRAN release!

ddsjoberg commented 1 year ago

@lkowarski2 FYI, in the next release of ggsurvfit, we are no longer rounding the weighted statistics by default. You'll need to round them using glue-like syntax. We're generalizing what can be placed in the risktable, and needed the ability to change the formatting function.

library(ggsurvfit)
#> Loading required package: ggplot2

adtte_exwgt <- visR::adtte
adtte_exwgt$SIPW <- runif(nrow(visR::adtte), min=0.5, max=1.5)

survfit2(
  formula = Surv_CNSR() ~ TRTP, 
  weight = SIPW,
  data = adtte_exwgt, 
  id = USUBJID, 
  cluster = USUBJID, 
  robust = TRUE
) |> 
  ggsurvfit() +
  add_risktable(
    risktable_stats = c("{round(n.risk)}", "{round(cum.event)}"),
    stats_label = c("At Risk", "Cumulative Events")
  )

Created on 2023-03-12 with reprex v2.0.2