therneau / survival

Survival package for R
386 stars 104 forks source link

`survfitAJ` error with non-uniform starting states on Survival version 3.6-4 and above #278

Open steliosbl opened 2 weeks ago

steliosbl commented 2 weeks ago

Hello, I am attempting to fit a multi-state Cox PH with different records having different starting states at time t0.

I have put together a minimal, reproducible example of the error I am facing using the mgus data exactly as demonstrated in the vignette. My deviation is that, while in the vignette a record will always enter the model at state (s0), I want to allow entry in other states, for example directly into the state pcm at t0.

library(survival)

# First, we prepare the multi-state dataset.
# This code is copied directly from the multi-state vignette, section 2.2
ptemp <- with(mgus2, ifelse(ptime==futime & pstat==1, ptime-.1, ptime))
data3 <- tmerge(mgus2, mgus2,  id=id, death=event(futime, death),
                  pcm = event(ptemp, pstat))
data3 <- tmerge(data3, data3, id, enum=cumtdc(tstart))
temp <- with(data3, ifelse(death==1, 2, pcm))
data3$event <- factor(temp, 0:2, labels=c("censor", "pcm", "death"))

# We add an `istate` column giving the starting state in each interval. 
# For simplicity, we auto-generate this using survcheck
check <- survcheck(Surv(tstart, tstop, event) ~ 1, id=id, data3)
data3$istate <- check$istate

### Key ingredient ###
# We want to include records that begin in states other than (s0). 
# To demonstrate this, we will alter one record. Record 1499 normally goes (s0) -> death
# We will alter it so that it goes pcm -> death instead. This corresponds to a patient starting already at pcm at time 0.

data3[1499, "istate"] <- "pcm"

# Finally, we fit the Cox PH model and run survfit
msfit <- coxph(Surv(tstart, tstop, event) ~ sex + age, data=data3, id=id, istate=istate)

ndata <- expand.grid(sex=c("F", "M"), age=c(60, 80))
mssurv <- survfit(msfit, newdata=ndata) # The line where errors may occur

In version 3.5-8 of the package, this runs as expected. However, in versions 3.6-4 and 3.7-0, the following error occurs in the final line:

> mssurv <- survfit(msfit, newdata=ndata)
Error in survfitAJ(as.factor(tempstrat), Y, weights, id = oldid, istate = istate,  : 
  no one at risk for one of the curves, at the default time 0; specify a start.time

What is the cause of this error in the newer versions? Is there a bug, or do newer versions not support multi-state situations with non-uniform starting states? Or, if they do, is there a special procedure for that?

Thank you for your assistance.

therneau commented 2 weeks ago

This is a result of being more careful in the code, but not foreseeing your case.
A. When not everyone starts in the same state and the survfit call does not specify p0, but does specify a start.time t0, then p0 is estimated by using the distribution of all those at risk just before t0. That is, all obs with time1 < t0 and time2 >= t0. So if start.time= t0 were age 60, for instance, you do not want to count someone who entered at age 60: they were (just barely) not yet in the risk set.
B. When p0 is specified, we use it. C. It start.time is not given, I set t0 = min observed start time, to use it as the starting point for plots. Detailed test cases for residuals.survfit forced me to think all this through more carefully.

In every data set I have, where subjects start in multiple states, I use either start.time or p0 to create curves, (and almost always the latter) It has simply been what is appropriate for every one of my data examples. Thus, in the update where I tried to be much more careful about the starting estimates I missed the case you give. While I go off and think about it, add "start.time=.01" to your call and all will be well. Who exactly is in or out at a tied time is something I try to be very careful about.

I myself would add p0=c(1,0,0) as the argument, to obtain curves that answered the question "what is the future survival for someone who starts in state (s0) at time 0", or use p0=c(0,1,0) for "what is the future survival for someone who starts in the pcm state at time 0". I have not been interested in "survival for a mixed cohort comprised of 1383/1384 s0 and 1/1384 pcm patients". In any given study, that initial mix usually has more to do with the enrollment strategy than a real population prevalence.

steliosbl commented 2 weeks ago

Thank you very much for your quick response. Both solutions, start.time=0.1 and setting p0, work.

To clarify my use case a bit for your interest, I am indeed investigating the future survival for a mixed cohort, although with more than just a single individual entering in intermediate states!

It's also my fault I hadn't realised the use of the p0 argument in survfit, I will be using it from now on!