therneau / survival

Survival package for R
381 stars 104 forks source link

`survfit.coxphms()` produces an error with more than one strata #243

Closed samrickman closed 6 months ago

samrickman commented 7 months ago

It appears that survfit.coxphms() produces an error when the competing risks model has more than one stratification level:

mgus2$etime <- with(mgus2, ifelse(pstat == 0, futime, ptime))
event <- with(mgus2, ifelse(pstat == 0, 2 * death, 1))
mgus2$event <- factor(event, 0:2, labels = c("censor", "pcm", "death"))

# Create another strata variable
mgus2$age_group <- cut(mgus2$age, 4)

# Create model with stratification
cfit <- coxph(Surv(etime, event) ~ strata(age_group) + strata(sex) + mspike, mgus2, id = id)

# Fit model to new data
dummy <- expand.grid(sex = c("F", "M"), age = c(60, 80), mspike = 1.2)
csurv <- survfit(cfit, newdata = dummy)
# Error in .subset2(x, i, exact = exact) : no such index at level 1

The error appears to be caused by this line of code in survfit.coxphms():

if (has.strata && !is.null(mf2[[stangle$vars]]))

In this case, stangle$vars is c("strata(age_group)", "strata(sex)"). As its length is > 1, [[ throws an error.

Possible fix

I forked the repo and changed the line to:

if (has.strata && !all(vapply(stangle$vars, function(x) is.null(mf2[[x]]), logical(1))))

I can now run:

csurv <- survfit(cfit, newdata = dummy)

I'm not familiar with the mgus2 dataset but the plot looks OK to me:

plot(
    csurv["(42,60], F", , "pcm"], xscale = 12, mark.time = FALSE, lwd = 2,
    xlab = "Years post diagnosis", ylab = "Probability of PCM"
)
lines(csurv["(42,60], M", , "pcm"], col = 2, lty = 2, lwd = 2)
legend("topleft", c("(42,60], F", "(42,60], M"), col = 1:2, lwd = 3, cex = 2)

image

This amended version of the function produces identical() output to the previous version of the function in cases where that version worked (i.e. with one strata).

I see your README.md says you don't use git for pull requests so I haven't submitted one. In any case, I imagine you might want to test this a little more. Here is the output of R CMD check after I made the change.

R CMD check output * using log directory ‘/home/sam/r/survival-fork/survival/..Rcheck’ * using R version 4.2.2 Patched (2022-11-10 r83330) * using platform: x86_64-pc-linux-gnu (64-bit) * using session charset: UTF-8 * checking for file ‘./DESCRIPTION’ ... OK * this is package ‘survival’ version ‘3.5-6’ * checking package namespace information ... OK * checking package dependencies ... OK * checking if this is a source package ... WARNING Subdirectory ‘survival/src’ contains apparent object files/libraries agexact.o agfit4.o agfit5.o agmart.o agmart3.o agscore2.o agsurv4.o agsurv5.o cdecomp.o chinv2.o chinv3.o cholesky2.o cholesky3.o cholesky5.o chsolve2.o chsolve3.o chsolve5.o collapse.o concordance1.o concordance3.o concordance5.o cox_Rcallback.o coxcount1.o coxdetail.o coxexact.o coxfit5.o coxfit6.o coxmart.o coxmart2.o coxph_wtest.o coxsafe.o coxscho.o coxscore2.o coxsurv1.o coxsurv2.o coxsurv3.o coxsurv4.o dmatrix.o doloop.o fastkm.o finegray.o gchol.o init.o multicheck.o norisk.o pyears1.o pyears2.o pyears3b.o pystep.o survConcordance.o survdiff2.o survfit4.o survfitci.o survfitkm.o survfitresid.o survival.so survpenal.o survreg6.o survreg7.o survregc1.o survregc2.o survsplit.o tmerge.o zph1.o zph2.o Object files/libraries should not be included in a source package. * checking if there is a namespace ... OK * checking for executable files ... OK * checking for hidden files and directories ... NOTE Found the following hidden files and directories: .hgignore ..Rcheck .git .vscode These were most likely included in error. See section ‘Package structure’ in the ‘Writing R Extensions’ manual. * checking for portable file names ... OK * checking for sufficient/correct file permissions ... OK * checking whether package ‘survival’ can be installed ... OK * checking installed package size ... OK * checking package directory ... OK * checking DESCRIPTION meta-information ... NOTE Checking should be performed on sources prepared by ‘R CMD build’. * checking top-level files ... OK * checking for left-over files ... OK * checking index information ... OK * checking package subdirectories ... WARNING Found the following directory with the name of a check directory: ./..Rcheck Most likely, these were included erroneously. * checking R files for non-ASCII characters ... OK * checking R files for syntax errors ... OK * checking whether the package can be loaded ... OK * checking whether the package can be loaded with stated dependencies ... OK * checking whether the package can be unloaded cleanly ... OK * checking whether the namespace can be loaded with stated dependencies ... OK * checking whether the namespace can be unloaded cleanly ... OK * checking dependencies in R code ... OK * checking S3 generic/method consistency ... OK * checking replacement functions ... OK * checking foreign function calls ... OK * checking R code for possible problems ... OK * checking Rd files ... OK * checking Rd metadata ... OK * checking Rd cross-references ... OK * checking for missing documentation entries ... OK * checking for code/documentation mismatches ... OK * checking Rd \usage sections ... OK * checking Rd contents ... OK * checking for unstated dependencies in examples ... OK * checking contents of ‘data’ directory ... OK * checking data for non-ASCII characters ... OK * checking LazyData ... OK * checking data for ASCII and uncompressed saves ... OK * checking line endings in C/C++/Fortran sources/headers ... OK * checking line endings in Makefiles ... OK * checking for GNU extensions in Makefiles ... OK * checking include directives in Makefiles ... OK * checking compiled code ... OK * checking files in ‘vignettes’ ... WARNING Files in the 'vignettes' directory but no files in 'inst/doc': ‘adjcurve.Rnw’, ‘approximate.Rnw’, ‘compete.Rnw’, ‘concordance.Rnw’, ‘discrim.Rnw’, ‘multi.Rnw’, ‘other.Rnw’, ‘population.Rnw’, ‘redistribute.Rnw’, ‘refer.bib’, ‘splines.Rnw’, ‘survival.Rnw’, ‘tiedtimes.Rnw’, ‘timedep.Rnw’, ‘validate.Rnw’ * checking examples ... OK * checking differences from ‘survival-Ex.Rout’ to ‘survival-Ex.Rout.save’ ... OK * checking for unstated dependencies in ‘tests’ ... OK * checking tests ... OK Running ‘aareg.R’ Comparing ‘aareg.Rout’ to ‘aareg.Rout.save’ ... OK Running ‘anova.R’ Comparing ‘anova.Rout’ to ‘anova.Rout.save’ ... OK Running ‘bladder.R’ Comparing ‘bladder.Rout’ to ‘bladder.Rout.save’ ... OK Running ‘book1.R’ Comparing ‘book1.Rout’ to ‘book1.Rout.save’ ... OK Running ‘book2.R’ Comparing ‘book2.Rout’ to ‘book2.Rout.save’ ... OK Running ‘book3.R’ Comparing ‘book3.Rout’ to ‘book3.Rout.save’ ... OK Running ‘book4.R’ Comparing ‘book4.Rout’ to ‘book4.Rout.save’ ... OK Running ‘book5.R’ Comparing ‘book5.Rout’ to ‘book5.Rout.save’ ... OK Running ‘book6.R’ Comparing ‘book6.Rout’ to ‘book6.Rout.save’ ... OK Running ‘book7.R’ Comparing ‘book7.Rout’ to ‘book7.Rout.save’ ... OK Running ‘brier.R’ Comparing ‘brier.Rout’ to ‘brier.Rout.save’ ... OK Running ‘cancer.R’ Comparing ‘cancer.Rout’ to ‘cancer.Rout.save’ ... OK Running ‘checkSurv2.R’ Comparing ‘checkSurv2.Rout’ to ‘checkSurv2.Rout.save’ ... OK Running ‘clogit.R’ Comparing ‘clogit.Rout’ to ‘clogit.Rout.save’ ... OK Running ‘concordance.R’ Comparing ‘concordance.Rout’ to ‘concordance.Rout.save’ ... OK Running ‘concordance2.R’ Comparing ‘concordance2.Rout’ to ‘concordance2.Rout.save’ ... OK Running ‘concordance3.R’ Comparing ‘concordance3.Rout’ to ‘concordance3.Rout.save’ ... OK Running ‘counting.R’ Comparing ‘counting.Rout’ to ‘counting.Rout.save’ ... OK Running ‘coxsurv.R’ Comparing ‘coxsurv.Rout’ to ‘coxsurv.Rout.save’ ... OK Running ‘coxsurv2.R’ Comparing ‘coxsurv2.Rout’ to ‘coxsurv2.Rout.save’ ... OK Running ‘coxsurv3.R’ Comparing ‘coxsurv3.Rout’ to ‘coxsurv3.Rout.save’ ... OK Running ‘coxsurv4.R’ Comparing ‘coxsurv4.Rout’ to ‘coxsurv4.Rout.save’ ... OK Running ‘coxsurv5.R’ Comparing ‘coxsurv5.Rout’ to ‘coxsurv5.Rout.save’ ... OK Running ‘coxsurv6.R’ Comparing ‘coxsurv6.Rout’ to ‘coxsurv6.Rout.save’ ... OK Running ‘detail.R’ Comparing ‘detail.Rout’ to ‘detail.Rout.save’ ... OK Running ‘difftest.R’ Comparing ‘difftest.Rout’ to ‘difftest.Rout.save’ ... OK Running ‘doaml.R’ Comparing ‘doaml.Rout’ to ‘doaml.Rout.save’ ... OK Running ‘doweight.R’ Comparing ‘doweight.Rout’ to ‘doweight.Rout.save’ ... OK Running ‘dropspecial.R’ Comparing ‘dropspecial.Rout’ to ‘dropspecial.Rout.save’ ... OK Running ‘expected.R’ Comparing ‘expected.Rout’ to ‘expected.Rout.save’ ... OK Running ‘expected2.R’ Comparing ‘expected2.Rout’ to ‘expected2.Rout.save’ ... OK Running ‘factor.R’ Comparing ‘factor.Rout’ to ‘factor.Rout.save’ ... OK Running ‘factor2.R’ Comparing ‘factor2.Rout’ to ‘factor2.Rout.save’ ... OK Running ‘finegray.R’ Comparing ‘finegray.Rout’ to ‘finegray.Rout.save’ ... OK Running ‘fr_cancer.R’ Comparing ‘fr_cancer.Rout’ to ‘fr_cancer.Rout.save’ ... OK Running ‘fr_kidney.R’ Comparing ‘fr_kidney.Rout’ to ‘fr_kidney.Rout.save’ ... OK Running ‘fr_lung.R’ Comparing ‘fr_lung.Rout’ to ‘fr_lung.Rout.save’ ... OK Running ‘fr_ovarian.R’ Comparing ‘fr_ovarian.Rout’ to ‘fr_ovarian.Rout.save’ ... OK Running ‘fr_rat1.R’ Comparing ‘fr_rat1.Rout’ to ‘fr_rat1.Rout.save’ ... OK Running ‘fr_resid.R’ Comparing ‘fr_resid.Rout’ to ‘fr_resid.Rout.save’ ... OK Running ‘fr_simple.R’ Comparing ‘fr_simple.Rout’ to ‘fr_simple.Rout.save’ ... OK Running ‘frailty.R’ Comparing ‘frailty.Rout’ to ‘frailty.Rout.save’ ... OK Running ‘frank.R’ Comparing ‘frank.Rout’ to ‘frank.Rout.save’ ... OK Running ‘infcox.R’ Comparing ‘infcox.Rout’ to ‘infcox.Rout.save’ ... OK Running ‘jasa.R’ Comparing ‘jasa.Rout’ to ‘jasa.Rout.save’ ... OK Running ‘model.matrix.R’ Comparing ‘model.matrix.Rout’ to ‘model.matrix.Rout.save’ ... OK Running ‘mstate.R’ Comparing ‘mstate.Rout’ to ‘mstate.Rout.save’ ... OK Running ‘mstrata.R’ Comparing ‘mstrata.Rout’ to ‘mstrata.Rout.save’ ... OK Running ‘multi2.R’ Comparing ‘multi2.Rout’ to ‘multi2.Rout.save’ ... OK Running ‘multi3.R’ Comparing ‘multi3.Rout’ to ‘multi3.Rout.save’ ... OK Running ‘multistate.R’ Comparing ‘multistate.Rout’ to ‘multistate.Rout.save’ ... OK Running ‘neardate.R’ Comparing ‘neardate.Rout’ to ‘neardate.Rout.save’ ... OK Running ‘nested.R’ Comparing ‘nested.Rout’ to ‘nested.Rout.save’ ... OK Running ‘nsk.R’ Comparing ‘nsk.Rout’ to ‘nsk.Rout.save’ ... OK Running ‘ovarian.R’ Comparing ‘ovarian.Rout’ to ‘ovarian.Rout.save’ ... OK Running ‘overlap.R’ Comparing ‘overlap.Rout’ to ‘overlap.Rout.save’ ... OK Running ‘plot.R’ Running ‘prednew.R’ Comparing ‘prednew.Rout’ to ‘prednew.Rout.save’ ... OK Running ‘pseudo.R’ Comparing ‘pseudo.Rout’ to ‘pseudo.Rout.save’ ... OK Running ‘pspline.R’ Comparing ‘pspline.Rout’ to ‘pspline.Rout.save’ ... OK Running ‘pyear.R’ Comparing ‘pyear.Rout’ to ‘pyear.Rout.save’ ... OK Running ‘quantile.R’ Comparing ‘quantile.Rout’ to ‘quantile.Rout.save’ ... OK Running ‘r_lung.R’ Comparing ‘r_lung.Rout’ to ‘r_lung.Rout.save’ ... OK Running ‘r_resid.R’ Comparing ‘r_resid.Rout’ to ‘r_resid.Rout.save’ ... OK Running ‘r_sas.R’ Comparing ‘r_sas.Rout’ to ‘r_sas.Rout.save’ ... OK Running ‘r_scale.R’ Comparing ‘r_scale.Rout’ to ‘r_scale.Rout.save’ ... OK Running ‘r_stanford.R’ Comparing ‘r_stanford.Rout’ to ‘r_stanford.Rout.save’ ... OK Running ‘r_strata.R’ Comparing ‘r_strata.Rout’ to ‘r_strata.Rout.save’ ... OK Running ‘r_tdist.R’ Comparing ‘r_tdist.Rout’ to ‘r_tdist.Rout.save’ ... OK Running ‘r_user.R’ Comparing ‘r_user.Rout’ to ‘r_user.Rout.save’ ... OK Running ‘ratetable.R’ Comparing ‘ratetable.Rout’ to ‘ratetable.Rout.save’ ... OK Running ‘residsf.R’ Comparing ‘residsf.Rout’ to ‘residsf.Rout.save’ ... OK Running ‘royston.R’ Comparing ‘royston.Rout’ to ‘royston.Rout.save’ ... OK Running ‘rttright.R’ Comparing ‘rttright.Rout’ to ‘rttright.Rout.save’ ... OK Running ‘singtest.R’ Comparing ‘singtest.Rout’ to ‘singtest.Rout.save’ ... OK Running ‘strata2.R’ Comparing ‘strata2.Rout’ to ‘strata2.Rout.save’ ... OK Running ‘stratatest.R’ Comparing ‘stratatest.Rout’ to ‘stratatest.Rout.save’ ... OK Running ‘summary_survfit.R’ Comparing ‘summary_survfit.Rout’ to ‘summary_survfit.Rout.save’ ... OK Running ‘surv.R’ Comparing ‘surv.Rout’ to ‘surv.Rout.save’ ... OK Running ‘survSplit.R’ Comparing ‘survSplit.Rout’ to ‘survSplit.Rout.save’ ... OK Running ‘survcheck.R’ Comparing ‘survcheck.Rout’ to ‘survcheck.Rout.save’ ... OK Running ‘survfit1.R’ Comparing ‘survfit1.Rout’ to ‘survfit1.Rout.save’ ... OK Running ‘survfit2.R’ Comparing ‘survfit2.Rout’ to ‘survfit2.Rout.save’ ... OK Running ‘survreg1.R’ Comparing ‘survreg1.Rout’ to ‘survreg1.Rout.save’ ... OK Running ‘survreg2.R’ Comparing ‘survreg2.Rout’ to ‘survreg2.Rout.save’ ... OK Running ‘survtest.R’ Comparing ‘survtest.Rout’ to ‘survtest.Rout.save’ ... OK Running ‘testci.R’ Comparing ‘testci.Rout’ to ‘testci.Rout.save’ ... OK Running ‘testci2.R’ Comparing ‘testci2.Rout’ to ‘testci2.Rout.save’ ... OK Running ‘testnull.R’ Comparing ‘testnull.Rout’ to ‘testnull.Rout.save’ ... OK Running ‘testreg.R’ Comparing ‘testreg.Rout’ to ‘testreg.Rout.save’ ... OK Running ‘tiedtime.R’ Comparing ‘tiedtime.Rout’ to ‘tiedtime.Rout.save’ ... OK Running ‘tmerge.R’ Comparing ‘tmerge.Rout’ to ‘tmerge.Rout.save’ ... OK Running ‘tmerge2.R’ Comparing ‘tmerge2.Rout’ to ‘tmerge2.Rout.save’ ... OK Running ‘tmerge3.R’ Comparing ‘tmerge3.Rout’ to ‘tmerge3.Rout.save’ ... OK Running ‘tt.R’ Comparing ‘tt.Rout’ to ‘tt.Rout.save’ ... OK Running ‘tt2.R’ Running ‘turnbull.R’ Comparing ‘turnbull.Rout’ to ‘turnbull.Rout.save’ ... OK Running ‘update.R’ Comparing ‘update.Rout’ to ‘update.Rout.save’ ... OK Running ‘yates0.R’ Comparing ‘yates0.Rout’ to ‘yates0.Rout.save’ ...63c63 < 1 2 3 --- > A B C Running ‘yates1.R’ Comparing ‘yates1.Rout’ to ‘yates1.Rout.save’ ... OK Running ‘yates2.R’ Running ‘zph.R’ Comparing ‘zph.Rout’ to ‘zph.Rout.save’ ... OK * checking for unstated dependencies in vignettes ... OK * checking package vignettes in ‘inst/doc’ ... WARNING Directory 'inst/doc' does not exist. Package vignettes without corresponding single PDF/HTML: ‘adjcurve.Rnw’ ‘approximate.Rnw’ ‘compete.Rnw’ ‘concordance.Rnw’ ‘discrim.Rnw’ ‘multi.Rnw’ ‘other.Rnw’ ‘population.Rnw’ ‘redistribute.Rnw’ ‘splines.Rnw’ ‘survival.Rnw’ ‘tiedtimes.Rnw’ ‘timedep.Rnw’ ‘validate.Rnw’ * checking running R code from vignettes ... OK ‘adjcurve.Rnw’... OK ‘approximate.Rnw’... OK ‘compete.Rnw’... OK ‘concordance.Rnw’... OK ‘discrim.Rnw’... OK ‘multi.Rnw’... OK ‘other.Rnw’... OK ‘population.Rnw’... OK ‘redistribute.Rnw’... OK ‘splines.Rnw’... OK ‘survival.Rnw’... OK ‘tiedtimes.Rnw’... OK ‘timedep.Rnw’... OK ‘validate.Rnw’... OK * checking re-building of vignette outputs ... OK * checking PDF version of manual ... OK * DONE Status: 4 WARNINGs, 2 NOTEs

Seems OK to me - the tests ran without any issues, no errors were raised, and the only warnings were there previously or caused by my IDE (e.g. .vscode directory).

samrickman commented 6 months ago

A quick update after an email discussion with @therneau who has pointed out that this error can be avoided by fitting the model with a different syntax. From the email:

For a Cox model, a fit with strata(x1) + strata(x2) is the same as one with strata(x1, x2). The same likelihood, just a different syntax.

In this case this means:

# Previous syntax
cfit <- coxph(Surv(etime, event) ~ strata(age_group) + strata(sex) + mspike, mgus2, id = id)
# Syntax to avoid the error
cfit2 <- coxph(Surv(etime, event) ~ strata(age_group, sex) + mspike, mgus2, id = id)

We can then compare the output of my survfit() "fix" with the current survfit() using this alternative syntax:

# Previous syntax with the change in survfit() I defined above
csurv_new <- new_survfit(cfit, newdata = dummy)
# Using the current survfit()
csurv  <- survfit(cfit2, newdata = dummy)

These produce (almost) identical objects:

mapply(identical, csurv, csurv_new)
#           n        time      n.risk     n.event    n.censor      pstate 
#        TRUE        TRUE        TRUE        TRUE        TRUE        TRUE 
#          p0      strata      cumhaz  start.time transitions      states 
#        TRUE        TRUE        TRUE        TRUE        TRUE        TRUE 
#        type     newdata        call 
#        TRUE        TRUE       FALSE 

As expected the only difference is the $call, because the name of the model object is different:

csurv_new$call # survfit(formula = cfit, newdata = dummy)
csurv$call # survfit(formula = cfit2, newdata = dummy)

So this functionality already exists in the package. However, as it was not obvious to me, this may be the case with others and there might be a benefit to supporting both syntaxes.

therneau commented 6 months ago

This should now be repaired