therneau / survival

Survival package for R
381 stars 104 forks source link

"Special trickery for matched case-control data" and ymax in concordancefit() #228

Open neilstats opened 1 year ago

neilstats commented 1 year ago

Within concordancefit(), the "Special trickery for matched case-control data" involves changing the y to create disjoint time intervals, but then the ymax argument (and presumably also ymin) do not have the intended effect.

For example, in all the following ymax > max(dfr$ptime) so expect same result as without setting ymax, but the correct result is returned only when keepstrata = TRUE (and the "special trickery" is not used).

library(survival)

dfr <- mgus2[!is.na(mgus2$mspike),]

## create 12 strata
dfr$stratum <- cut(dfr$age, breaks = c(0, seq(45, 100, 5)))

## fit models
fit1 <- coxph(Surv(ptime, pstat) ~ mspike + strata(stratum),
              dfr)

## without ymax
concordance(Surv(ptime, pstat) ~ predict(fit1) + strata(stratum),
            data = dfr)

## with ymax
concordance(Surv(ptime, pstat) ~ predict(fit1) + strata(stratum),
            data = dfr,
            keepstrata = TRUE,
            ymax = 500)

concordance(Surv(ptime, pstat) ~ predict(fit1) + strata(stratum),
            data = dfr,
            keepstrata = FALSE,
            ymax = 500)

concordance(Surv(ptime, pstat) ~ predict(fit1) + strata(stratum),
            data = dfr,
            keepstrata = TRUE,
            ymax = 2000)

concordance(Surv(ptime, pstat) ~ predict(fit1) + strata(stratum),
            data = dfr,
            keepstrata = FALSE,
            ymax = 2000)
therneau commented 1 year ago

Good catch. I hadn't thought about ymax/ymin when doing the 'trickery' code. In my defense, I can't think of a case where I would want to use ymin or ymax in case-control data. Could you give some context, just for my education? (It's still a bug.)

neilstats commented 1 year ago

I found this when working with survival data, with >10 strata, rather than case-control.