hputter / mstate

https://hputter.github.io/mstate/
7 stars 5 forks source link

`msfit`: Error when strata variable has one level #14

Closed MetzgerSK closed 1 year ago

MetzgerSK commented 2 years ago

R: 4.1.3 64-bit, mstate: 0.3.2, survival: 3.3.0, Win10

Problem

For a non-parametric Cox model with a single stratum defined by its specified strata() variable, msfit produces an error:

Error in data.frame(time = sf0$time, Haz = -log(sf0$surv), norisk = norisk,  :
  arguments imply differing number of rows: 4, 0

Problem's Source

The issue is msfit's lines 159–161:

sf0 <- data.frame(time = sf0$time, Haz = -log(sf0$surv),
    norisk = norisk, noevent = noevent, var = sf0$std.err^2/(sf0$surv)^2,
    trans = as.numeric(sf0$strata))

Prior to these lines, sf0 contains summary(survfit(object)). sf0$strata evaluates to NULL when the strata() term's variable has only one level. Thus, as.numeric(sf0$strata)) returns with length 0, whereas every other quantity in the to-be-formed data frame has length != 0.

Context

This model's an odd edge case, to be sure. I'm not sure how many people would run into this issue in practice. For me, I was writing unit tests for mstate's Stata equivalent, and some of those tests involve comparing R-mstate output with Stata's equivalent. The comparison code's written to work with both non-parametric and semi-parametric model specifications, which is the only reason I stumbled across the issue.

Nevertheless, I didn't see anything in msfit's help file about this specific situation, nor did I see any open or closed GitHub issues about it, so I wanted to bring it to your attention.

MWE

library(survival)
library(mstate)
library(dplyr)

# (Yoinked from coxph's examples, but swapped in mstate varnames)
# Create the simplest test data set
test1 <- list(stop=c(4,3,1,1,2,2,3),
              status=c(1,1,1,0,1,1,0),
              x=c(0,2,1,1,1,0,0),
              sex=c(0,0,0,0,1,1,1),
              trans=c(1,1,1,1,1,1,1))
test1 <- test1 %>%
           as.data.frame(.) %>%
           mutate(start = 0,
                  from = 1,
                  to = 2)

    ## Trans mat
    tmat <- transMat(list(c(2),c()))

    ## Set mstate attributes
    attr(test1, "trans") <- tmat
    class(test1) <- c("msdata", "data.frame")

#---------- ATTEMPT 1 -------------
# Model
mod <- coxph(Surv(start, stop, status) ~ strata(trans), test1)
mod.sf <- survfit(mod)

# msfit (will throw error)
msfit(mod, trans=tmat)

#---------- ATTEMPT 2 -------------
# Model
mod2 <- coxph(Surv(start, stop, status) ~ 1, test1)
mod2.sf <- survfit(mod2)

    # survfit objs equiv, as they should be?
    waldo::compare(mod.sf, mod2.sf) # YES (aside from mod obj name passed into survfit)

# msfit (all will throw an error)
msfit(mod2, trans=tmat)                            # wants newdata object
msfit(mod2, trans=tmat,
      newdata=c(1, strata=1))                      # 'data' must be a data.frame, environment, or list
msfit(mod2, trans=tmat,
      newdata=data.frame(takeAGuess=1, strata=1))  # incorrect number of dimensions
edbonneville commented 2 years ago

@MetzgerSK apologies for the (super) late reply, but thank you already for taking the time to raise this and the other issues! I will work through them in the coming weeks.

Also really nice that you mention writing of unit tests for the Stata equivalent - we (together with @hputter ) would also like to start doing the same for {mstate} in R.. will probably start trying some things out with {tinytest} 😃

edbonneville commented 1 year ago

@MetzgerSK thank you again for raising this and the other issues.

Commit 09f421f addresses this issue and #15 . The main change is (and we will have to make this explicit in documentation too) to return an error if coxph() object fed into msfit() does not contain a strata() in the formula. The {survival} package should remain the tool of choice for prediction for models without strata.

It is however fair to accommodate the edge case of a strata with just one level (it works for that now); we may perhaps add a warning at a later stage.

MetzgerSK commented 1 year ago

@MetzgerSK thank you again for raising this and the other issues.

Of course, thanks for putting together and maintaining the package.

I know of one more (potential?) issue for which I've been meaning to file a report. I'll go ahead and do that within the next few hours, before I forget.