fix `n.risk` at `time == 0` in `fortify.survfit(*, surv.connect = TRUE)` + fix `n.censor` for `survfitms` objects

ThomasSoeiro commented 3 months ago

Hi @terrytangyuan,

Here is a proposal for #229.

I kept the original behavior for survfitms objects because I am not sure if a fix is needed here too.

Some tests: (models are from ?survfit; fortify.survfit2 is the patched function)

library(survival)

fit <- survfit(Surv(time, status) ~ x, data = aml)
old <- ggfortify:::fortify.survfit(fit, surv.connect = TRUE)
new <- fortify.survfit2(fit, surv.connect = TRUE)
dplyr::setdiff(new, old)
#   time n.risk n.event n.censor surv std.err upper lower        strata
# 1    0     12       0        0    1       0     1     1 Nonmaintained

fitMS <- survfit(Surv(start, stop, event) ~ 1, id = id, data = mgus1)
old <- ggfortify:::fortify.survfit(fitMS, surv.connect = TRUE)
new <- fortify.survfit2(fitMS, surv.connect = TRUE)
identical(new, old)
# [1] TRUE

ThomasSoeiro commented 3 months ago

Thanks. Would you like to add some test cases for this?

Done in 92bc773.

I kept the original behavior for survfitms objects because I am not sure if a fix is needed here too.

Should we use d[d$time == ave(d$time, d$event, FUN = min), ] for survfitms objects?

fitMS <- survfit(Surv(start, stop, event) ~ 1, id = id, data = mgus1)
ggfortify:::fortify.survfit(fitMS) |> by(~ event, head)
# event: (s0)
#   time n.risk n.event n.censor    pstate     std.err     upper     lower event
# 1    6    241       0        0 0.9958506 0.004140760 1.0000000 0.9877679  (s0)
# 2    7    240       0        0 0.9917012 0.005843706 1.0000000 0.9803137  (s0)
# 3   31    239       0        0 0.9875519 0.007142061 1.0000000 0.9736524  (s0)
# 4   32    238       0        0 0.9834025 0.008229598 0.9996652 0.9674043  (s0)
# 5   39    237       0        0 0.9792531 0.009181538 0.9974150 0.9614220  (s0)
# 6   60    236       0        0 0.9751037 0.010036539 0.9949748 0.9556296  (s0)
# --------------------------------------------------------------------------- 
# event: pcm
#     time n.risk n.event n.censor pstate std.err upper lower event
# 301    6      0       0        0      0       0    NA    NA   pcm
# 302    7      0       0        0      0       0    NA    NA   pcm
# 303   31      0       0        0      0       0    NA    NA   pcm
# 304   32      0       0        0      0       0    NA    NA   pcm
# 305   39      0       0        0      0       0    NA    NA   pcm
# 306   60      0       0        0      0       0    NA    NA   pcm
# --------------------------------------------------------------------------- 
# event: death
#     time n.risk n.event n.censor      pstate     std.err      upper        lower event
# 601    6      0       1        0 0.004149378 0.004140760 0.02933707 0.0005868799 death
# 602    7      0       1        0 0.008298755 0.005843706 0.03299139 0.0020874940 death
# 603   31      0       1        0 0.012448133 0.007142061 0.03832457 0.0040432548 death
# 604   32      0       1        0 0.016597510 0.008229598 0.04386286 0.0062804232 death
# 605   39      0       1        0 0.020746888 0.009181538 0.04939151 0.0087147233 death
# 606   60      0       1        0 0.024896266 0.010036539 0.05486341 0.0112975857 death

ThomasSoeiro commented 3 months ago

BTW, regarding subset() in: https://github.com/sinhrks/ggfortify/blob/5191b4be6f5d4748f5fc55514d401c2044e784b5/R/fortify_surv.R#L42

From ?subset (Warning section):

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [[](http://127.0.0.1:61312/help/library/base/help/%5B), and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

Would you like to change that? Why is subset() wrapped in suppressWarnings()?

terrytangyuan commented 3 months ago

Sure, can you help update that piece of code and make sure everything still works as expected?

ThomasSoeiro commented 3 months ago

Should we use d[d$time == ave(d$time, d$event, FUN = min), ] for survfitms objects?

I have implemented that in 7277152 (and a test in 973b124).

Sure, can you help update that piece of code and make sure everything still works as expected?

Done in 906ad32 with some more (basic) clean up.

Some tests: (models are from ?survfit; fortify.survfit2 is the patched function)

library(survival)

fit <- survfit(Surv(time, status) ~ x, data = aml)
old <- ggfortify:::fortify.survfit(fit, surv.connect = TRUE)
new <- fortify.survfit2(fit, surv.connect = TRUE)
dplyr::setdiff(new, old)
#   time n.risk n.event n.censor surv std.err upper lower        strata
# 1    0     12       0        0    1       0     1     1 Nonmaintained

new2 <- fortify.survfit2(fit)
new2$strata <- as.character(new2$strata)
tdy <- broom::tidy(fit)
tdy$strata <- sub("x=", "", tdy$strata)
names(tdy) <- names(new2)
tdy <- as.data.frame(tdy)
identical(new2, tdy)
# [1] TRUE

fitMS <- survfit(Surv(start, stop, event) ~ 1, id = id, data = mgus1)
old <- ggfortify:::fortify.survfit(fitMS, surv.connect = TRUE)
new <- fortify.survfit2(fitMS, surv.connect = TRUE)
dplyr::setdiff(new, old)
#   time n.risk n.event n.censor pstate std.err upper lower event
# 1    0      0       0        0      0       0     0     0   pcm
# 2    0      0       0        0      0       0     0     0 death

new2 <- fortify.survfit2(fitMS)
new2$event <- as.character(new2$event)
tdy <- broom::tidy(fitMS)
mapply(identical, new2, tdy)
#     time   n.risk  n.event n.censor   pstate  std.err    upper    lower    event 
#     TRUE     TRUE     TRUE    FALSE     TRUE     TRUE     TRUE     TRUE     TRUE

So there may be an issue in n.censor for survfitms objects.

ThomasSoeiro commented 3 months ago

So is there an issue in n.censor for survfitms objects?

I proposed a fix for that in b3f60c8 (inspired by broom:::tidy.survfit()).

Some tests: (models are from ?survfit; fortify.survfit2 is the patched function; tests run with survival version 3.5-8 and 3.7-0)

library(survival)

fit <- survfit(Surv(time, status) ~ x, data = aml)
old <- ggfortify:::fortify.survfit(fit, surv.connect = TRUE)
new <- fortify.survfit2(fit, surv.connect = TRUE)
dplyr::setdiff(new, old)
#   time n.risk n.event n.censor surv std.err upper lower        strata
# 1    0     12       0        0    1       0     1     1 Nonmaintained

new2 <- fortify.survfit2(fit)
new2$strata <- as.character(new2$strata)
tdy <- broom::tidy(fit)
tdy$strata <- sub("x=", "", tdy$strata)
names(tdy) <- names(new2)
tdy <- as.data.frame(tdy)
identical(new2, tdy)
# [1] TRUE

fitMS <- survfit(Surv(start, stop, event) ~ 1, id = id, data = mgus1)
old <- ggfortify:::fortify.survfit(fitMS, surv.connect = TRUE)
old <- subset(old, select = -n.censor)
new <- fortify.survfit2(fitMS, surv.connect = TRUE)
new <- subset(new, select = -n.censor)
dplyr::setdiff(new, old)
#   time n.risk n.event pstate std.err upper lower event
# 1    0      0       0      0       0     0     0   pcm
# 2    0      0       0      0       0     0     0 death

new2 <- fortify.survfit2(fitMS)
new2$event <- as.character(new2$event)
tdy <- broom::tidy(fitMS)
names(tdy) <- names(new2)
tdy <- as.data.frame(tdy)
identical(new2, tdy)
# [1] TRUE

The logic in fortify.survfit() could be further simplified, but we probably should not spend too much time on this since fortify() may be deprecated in favor of {broom}. I think we should mostly focus on correcting the output.

So I think we are done now. What do you think?

(Please note that the proposed changes need careful review. I am not used to multi-state survival model and I am far from an expert in survival analysis.)

sinhrks / ggfortify

fix `n.risk` at `time == 0` in `fortify.survfit(*, surv.connect = TRUE)` + fix `n.censor` for `survfitms` objects #230