sinhrks / ggfortify

Define fortify and autoplot functions to allow ggplot2 to handle some popular R packages.
Other
525 stars 65 forks source link

`n.risk` at `time == 0` is not correct when `nlevels(strata) > 1` in `fortify.survfit(*, surv.connect = TRUE)` #229

Closed ThomasSoeiro closed 1 week ago

ThomasSoeiro commented 4 months ago

System information

Describe the problem

n.risk at time == 0 is not correct when nlevels(strata) > 1 in fortify.survfit(*, surv.connect = TRUE).

Source code / logs / plots

library(survival)

fit <- survfit(Surv(time, status) ~ x, data = aml)
summary(fit)
# Call: survfit(formula = Surv(time, status) ~ x, data = aml)
# 
#                 x=Maintained 
#  time n.risk n.event survival std.err lower 95% CI upper 95% CI
#     9     11       1    0.909  0.0867       0.7541        1.000
#    13     10       1    0.818  0.1163       0.6192        1.000
#    18      8       1    0.716  0.1397       0.4884        1.000
#    23      7       1    0.614  0.1526       0.3769        0.999
#    31      5       1    0.491  0.1642       0.2549        0.946
#    34      4       1    0.368  0.1627       0.1549        0.875
#    48      2       1    0.184  0.1535       0.0359        0.944
# 
#                 x=Nonmaintained 
#  time n.risk n.event survival std.err lower 95% CI upper 95% CI
#     5     12       2   0.8333  0.1076       0.6470        1.000
#     8     10       2   0.6667  0.1361       0.4468        0.995
#    12      8       1   0.5833  0.1423       0.3616        0.941
#    23      6       1   0.4861  0.1481       0.2675        0.883
#    27      5       1   0.3889  0.1470       0.1854        0.816
#    30      4       1   0.2917  0.1387       0.1148        0.741
#    33      3       1   0.1944  0.1219       0.0569        0.664
#    43      2       1   0.0972  0.0919       0.0153        0.620
#    45      1       1   0.0000     NaN           NA           NA

ggfortify:::fortify.survfit(fit, surv.connect = TRUE) |> head()
#   time n.risk n.event n.censor      surv    std.err     upper     lower        strata
# 1    0     11       0        0 1.0000000 0.00000000 1.0000000 1.0000000    Maintained
# 2    0     11       0        0 1.0000000 0.00000000 1.0000000 1.0000000 Nonmaintained  # <- n.risk should be 12
# 3    9     11       1        0 0.9090909 0.09534626 1.0000000 0.7541338    Maintained
# 4   13     10       1        1 0.8181818 0.14213381 1.0000000 0.6192490    Maintained
# 5   18      8       1        0 0.7159091 0.19508758 1.0000000 0.4884263    Maintained
# 6   23      7       1        0 0.6136364 0.24873417 0.9991576 0.3768671    Maintained

The issue is from: https://github.com/sinhrks/ggfortify/blob/5ed95240075a7322bd910402d0a1d259f0e98f02/R/fortify_surv.R#L65

Maybe it can be changed to d[d$time == ave(d$time, d$strata, FUN = min), ].

And then this will need to be simplified:

https://github.com/sinhrks/ggfortify/blob/5ed95240075a7322bd910402d0a1d259f0e98f02/R/fortify_surv.R#L73-L79

ThomasSoeiro commented 1 month ago

Hi @terrytangyuan, Do you confirm the issue? Would you consider a patch based on what I proposed in https://github.com/sinhrks/ggfortify/issues/229#issue-2176062346? Thanks!

terrytangyuan commented 3 weeks ago

Please submit a PR with additional test cases. Thanks!