therneau / survival

Survival package for R
381 stars 104 forks source link

Incorrect Efron approximation for competing risks #268

Open SiddRoy opened 1 month ago

SiddRoy commented 1 month ago

The competing risk vignette has an example that shows how we can fit competing risks model by manually stacking data in order to more easily share coefficients. However, the resulting cumulative hazard curves with new data do not match.

After looking into the code, the Efron approximation code in the coxsurv1 (and 2) used in multihaz doesn't match the correct approximation code in agsurv4 and agsurv5.

Making the following changes in coxsurv1 resulted in matching results between the 2 approaches.

  1. line 150: want k < 10 to reset n[8:9] = 0 and avoid accumulating components from previous intervals.
  2. line 183: meanwt = n[5] / n[3]
  3. Efron sums on 185-186 should be adding 1/(n[2] - k * meanwt)
  4. Based on 3, we would want to also use reciprocal terms on 179-180, 188-189 and change miltihaz function to use cn[,,5] / cn[,,9] for hazard computation similar to coxsurv.fit and agsurv.

Reproducible code showing the difference in cumulative hazards

library(survival)
library(tidyverse)

# example from competing risks vignette: setup mgus2
mgus2$etime <- with(mgus2, ifelse(pstat==0, futime, ptime))
event <- with(mgus2, ifelse(pstat==0, 2*death, 1))
mgus2$event <- factor(event, 0:2, labels=c("censor", "pcm", "death"))

# stacked vs. multistate CR: DATA setup by removing NA's 
mgus3 <- filter(mgus2, !is.na(mspike), !is.na(hgb))
temp1 <- mgus3 %>% mutate(tim = etime, status = (event == 'pcm') * 1, group = 'pcm')
temp2 <- mgus3 %>% mutate(tim = etime, status = (event == 'death') * 1, group = 'death')
stacked <- bind_rows(temp1, temp2)

# stacked vs. multistate CR: Cox fit with same hgb coefficient 
hgfit <-  coxph(list(Surv(etime, event) ~ age + sex + mspike,
                     1:2 + 1:3 ~ hgb / common),
                data = mgus3, id = id, x = TRUE, model = TRUE)
allfit <- coxph(Surv(tim, status) ~ hgb + (age + sex+ mspike) * strata(group),
                data=stacked)

# Compare cox results
max(abs(hgfit$loglik- allfit$loglik)) # 0

# stacked vs. multistate CR: Survival Curves
newd <- mgus3[1, ] %>% select(id, age, sex, mspike, hgb) %>% mutate(age = 35, mspike = .6, hgb = 13.3)
sf_MS <- survfit(hgfit, newdata = newd, se.fit = TRUE)

newd2 <- bind_rows(
  mutate(newd, group = 'pcm'),
  mutate(newd, group = 'death'))
sf_strat <- survfit(allfit, newdata = newd2)

# These should be the same also!
sf_MS$cumhaz[1:10,1, ]
matrix(sf_strat$cumhaz, 267)[1:10, ]
therneau commented 1 month ago

Impressive. I'll try to add your test to my set of test cases, though I'll need to remove all the tidyverse stuff.

SiddRoy commented 1 month ago

Thanks :)!