pharmaverse / ggsurvfit

http://www.danieldsjoberg.com/ggsurvfit/
Other
67 stars 19 forks source link

Labeling error #205

Closed therneau closed 2 months ago

therneau commented 2 months ago

The survfit routine computes Aalen-Johansen estimates. Two special cases of the AJ are the Kaplan-Meier (KM) with 2 states and 1 transtion, and the cumulative incidence (CI) for the case of competing risks. More complex models do not reduce to either the simple KM nor CI formula. Here is an example from the colon data (most of the work is setting this up). The Obs and Lev arms have nearly identical curves, and lumping them together simplifies the plot.

Big issue: the label of "Cumulative Incidence" is wrong, badly wrong. As a reviewer I would reject the figure. A rational default is "Probability in state" or P(state), reflecting the survfit result's pstate component. Current state as below could be ok.

Smaller issue: The function name itself is unwise, since it is not creating a cumulative incidence plot.

Positive: Colors, line types, decoration, etc. on the plot are first rate. My plot function is numerically correct, but getting a nice figure is a lot more work.

library(ggsurvfit)
#> Loading required package: ggplot2
library(survival)

# first a figure
states <- c("Entry", "Progression", "Death")
smat <- matrix(0, 3, 3, dimnames=list(states, states))
smat[1, 2:3] <- smat[2,3] <- 1
statefig(1:2, smat)


d1 <- subset(colon, etype==1)
d2 <- subset(colon, etype==2)
cdata <- tmerge(subset(d1,,c(id, rx, extent, node4)), d2, id=id,
                death=event(time, status))
cdata <- tmerge(cdata, d1, id = id, recur=event(time, status))

# a death and recurrence on the same day is counted as a recurrence
cdata$state <- with(cdata, factor(ifelse(recur==1,1, 2*death),  0:2, 
                                  c("censor", "recur", "death")))
cdata$trt <- 1*(cdata$rx=="Lev+5FU") # lump the Obs and Lev arms together

cfit <- survfit(Surv(tstart, tstop, state) ~ trt, cdata, id=id, model=TRUE)

plot(cfit, xscale=365.25, lty=1:2, col= rep(1:2, each=2), lwd=2,
     xlab="Years from randomization", ylab="Current State")
legend(1440, .3, c("Death, control", "Death, Lev + 5FU", 
                   "Progression, control", "Progression, Lev + 5FU"),
       col=c(2,2,1,1), lwd=2, lty=1:2, bty='n')


ggcuminc(cfit, outcome=c("recur", "death")) 

Created on 2024-04-22 with reprex v2.1.0

therneau commented 2 months ago

(I had a dickens of a time getting the code into the github panel above, so much so that I despaired and closed it, before trying again and succeeding.)

ddsjoberg commented 2 months ago

Thanks for the post @therneau !

I made the default to align with the work I was doing primarily (cancer researchers are often using competing risks), and it's processed through a function that references cumulative incidence. But I agree that a change should be made.

  1. We could do something simple like reference in the documentation that the label is Cum. Inc. which does not apply to all multi-state models.
  2. Preferably, we could have different defaults for the simpler competing risks model vs the more complex multi-state models. @therneau is there is a simple way to distinguish between these two? Perhaps a look at the transition matrix? If only the top row has counts then competing risks, otherwise multi-state?
    cfit$transitions
    #>        to
    #> from    recur death (censored)
    #>   (s0)    468    38        423
    #>   recur     0   409         52
    #>   death     0     0          0
therneau commented 2 months ago

Having the code I posted actually execute is very slick. I don't know if you did something magic or it happened by itself...

  1. If (and only if) only one row has counts then it will be a competing risks problem. Essentially, if all the states but one are absorbing (no one ever leaves the state). That will almost always be the top row of the transtions matrix, but if one sets levels just right on the state variable and uses the istate option, I think you could get a different order of the rows.
  2. This isn't perfect, e.g. I have had a CR data set where there was a non-zero for "censored" in the death row, i.e., some further follow-up after death. This was a data error of course.
  3. I prefer the "probability in state" label for all cases. (I think that "cumulative incidence" is a poor label even in the competing risks case, since it is not the integral of the incidence and thus misleading. But my chances of convincing the world at large of this are slim to none.)
ddsjoberg commented 2 months ago

Having the code I posted actually execute is very slick. I don't know if you did something magic or it happened by itself...

I just copied your code onto my clipboard, then ran reprex::reprex() into the R console, and it runs the code, saves the results and adds your your clipbaord markdown text you can paste directly into GitHub. I love this tool!

ddsjoberg commented 2 months ago

thank for the context and advice. Update is on its way

library(ggsurvfit)
#> Loading required package: ggplot2
library(survival)

d1 <- subset(colon, etype==1)
d2 <- subset(colon, etype==2)
cdata <- tmerge(subset(d1,,c(id, rx, extent, node4)), d2, id=id,
                death=event(time, status))
cdata <- tmerge(cdata, d1, id = id, recur=event(time, status))

# a death and recurrence on the same day is counted as a recurrence
cdata$state <- with(cdata, factor(ifelse(recur==1,1, 2*death),  0:2, 
                                  c("censor", "recur", "death")))
cdata$trt <- 1*(cdata$rx=="Lev+5FU") # lump the Obs and Lev arms together

cfit <- survfit(Surv(tstart, tstop, state) ~ trt, cdata, id=id, model=TRUE)

ggcuminc(cfit, outcome=c("recur", "death")) 

Created on 2024-04-25 with reprex v2.1.0