therneau / survival

Survival package for R
394 stars 106 forks source link

documentation of `survcheck()` #277

Open ThomasSoeiro opened 3 months ago

ThomasSoeiro commented 3 months ago

I have trouble understanding how to use survcheck(). I do not think there is any issue in the code but maybe the documentation could be improved (in particular regarding the formula).

I have a survival data set that contains data for 3 cohorts of patients. A patient can be included in several cohorts. In the end in build a survival model for each cohort. First, I start the analysis by a crude comparison of the cohorts:

survdiff(Surv(time, status) ~ cohort, df)
km <- survfit(Surv(time, status) ~ cohort, df)
plot(km)

I wanted to check the data. My first try was to reuse the same formula as above, but the RHS of the formula seems to be ignored (see the "Overlap check"):

survcheck(Surv(time, status) ~ cohort, df, id = id)
# Call:
# survcheck(formula = Surv(time, status) ~ cohort, data = df, id = id)
# 
# Unique identifiers       Observations        Transitions 
#                107                150                140 
# 
# Transitions table:
#       to
# from     1 (censored)
#   (s0) 140          8
#   1      0          0
# 
# Number of subjects with 0, 1, ... transitions to each state:
#        count
# state   0  1  2 3
#   1     8 62 33 4
#   (any) 8 62 33 4
# 
# Overlap check: 39 ids (43 rows)

Finally I found that the following calls returned identical outputs (beside the call component):

survcheck(Surv(time, status) ~ cohort, df, id = id)
survcheck(Surv(time, status) ~ 1, df, id = id)
survcheck(Surv(time, status) ~ strata(cohort), df, id = id)

It seems that I need to split the data before runing survcheck():

by(
  df,
  ~ cohort,
  \(x) survcheck(Surv(time, status) ~ 1, x, id = id)
)
# cohort: 1
# Call:
# survcheck(formula = Surv(time, status) ~ 1, data = x, id = id)
# 
# Unique identifiers       Observations        Transitions 
#                 50                 50                 48 
# 
# Transitions table:
#       to
# from    1 (censored)
#   (s0) 48          2
#   1     0          0
# 
# Number of subjects with 0, 1, ... transitions to each state:
#        count
# state   0  1
#   1     2 48
#   (any) 2 48
# 
# ------------------------------------------------------------------------------------------------------- 
# cohort: 2
# Call:
# survcheck(formula = Surv(time, status) ~ 1, data = x, id = id)
# 
# Unique identifiers       Observations        Transitions 
#                 50                 50                 46 
# 
# Transitions table:
#       to
# from    1 (censored)
#   (s0) 46          4
#   1     0          0
# 
# Number of subjects with 0, 1, ... transitions to each state:
#        count
# state   0  1
#   1     4 46
#   (any) 4 46
# 
# ------------------------------------------------------------------------------------------------------- 
# cohort: 3
# Call:
# survcheck(formula = Surv(time, status) ~ 1, data = x, id = id)
# 
# Unique identifiers       Observations        Transitions 
#                 50                 50                 46 
# 
# Transitions table:
#       to
# from    1 (censored)
#   (s0) 46          4
#   1     0          0
# 
# Number of subjects with 0, 1, ... transitions to each state:
#        count
# state   0  1
#   1     4 46
#   (any) 4 46
# 

Some data to reproduce examples:

df <- veteran
df$id <- seq_len(nrow(df))
df <- replicate(3, df[sample(nrow(df), 50), ], simplify = FALSE)
df <- Map(transform, df, cohort = 1:3)
df <- do.call(rbind, df)
therneau commented 3 months ago

survcheck is intended for mulit-state survival.

ThomasSoeiro commented 3 months ago

Currently, it does not appears in the title, nor in the Description, only in Details. However, if I understand correctly, some check are useful for "standard" survival dataset too.

I understand that this is low priority. I opened the issue just to let you know. Feel free to close without further comment. Thanks!

therneau commented 3 months ago

I do appreciate the comment. I work hard at making good documentation, but as someone who has worked in the package for a very long time there are blind spots where something is "obvious" to me but not the user. Input like yours is the best way for me to find out. But I do have too many projects to get to this right away.

ThomasSoeiro commented 3 months ago

Your hard work has already paid off; I think that survival the documentation is already excellent! (even for someone like me with no formal training in statistics)