therneau / survival

Survival package for R
394 stars 106 forks source link

predict.coxph expected values incorrect when formula contains pspline term and strata #261

Open RobertScott42 opened 5 months ago

RobertScott42 commented 5 months ago

Test using R 4.4.0 and survival 3.5-8

df1 <- cancer[ , c("time", "status", "age", "sex")]
testPoly <- coxph(Surv(time, status) ~ poly(age, 3) + strata(sex), data = df1)
termplot(testPoly, terms = 1, se=TRUE)
testPredPoly <- predict(testPoly, type="expected")

testSpline <- coxph(Surv(time, status) ~ pspline(age, df = 3) + strata(sex), data = df1)
termplot(testSpline, terms = 1, se=TRUE)
testPredSpline <- predict(testSpline, type="expected")

print(paste("Total events:", sum(df1$status==2)))
# 165
print(paste("Total expected (poly):", sum(testPredPoly)))
# 165
print(paste("Total expected (pspline):", sum(testPredSpline)))
# 506.877

predict appears to work fine with pspline when there are no strata terms. The termplot indicates that the spline fit is reasonable; it is just predict output that appears to be incorrect.

RobertScott42 commented 5 months ago

I forgot to mention that this appears to works OK with newdata, which I guess is the most common use-case. A workaround is to supply the original data via the newdata argument, i.e.: predict(testSpline, newdata = df1, type="expected") This gives a total expected of 165.2 (not quite the same, but close enough for most purposes).

RobertScott42 commented 4 months ago

Sorry to drip-feed information, but I found that there is no problem when using splines::bs or splines::ns. The problem appears to be confined to pspline in coxph with strata. That is my workaround for the time being, though I would rather use pspline if I could.