hputter / mstate

https://hputter.github.io/mstate/
7 stars 5 forks source link

`mssample`: Discrepant Gap vs. Clock Output #23

Open MetzgerSK opened 1 year ago

MetzgerSK commented 1 year ago

R: 4.1.3 64-bit, mstate: 0.3.2, survival: 3.5.0, Win11

Problem

mssample's output is different in situations where it seems like it shouldn't be. The discrepency arises for any situation in which (a) gap vs. clock time are identical metrics (e.g., situations where the starting stage has exiting transitions, like basic Cox or competing risk stage structures) and (b) when mssample()'s specified starting time $\neq 0$.

Problem's Source

The discrepency's stemming from these lines in mssample1() (lines 37–45):

if (clock == "forward") {
    crs <- crsample(Haz[whh, ], tcond, censtime)
    tcond <- Tstop <- crs$t
}
else {
    crs <- crsample(Haz[whh, ], t0, censtime)
    t0 <- 0
    tcond <- Tstop <- crs$t + tcond
}

The first time a subject transitions represents the actual time at which that event occurs: a transition occurring at, e.g., $t = 7$ for clock is equivalent to a transition occurring at $t=7$ for gap (only possible if the start time's $<7$, of course). Tstop should therefore equal 7 for both metrics after the first transition in a basic Cox or CR setup.

At present, that's not what will happen. To continue the running example, Tstop will correctly = 7 after the subject's first transition for clock time. However, for gap time, Tstop will equal 7 + tcond.

Earlier, tcond is defined as (mssample1(), line 15):

tcond <- t0 <- Tstart <- history$time

history$time is the start time set by the user in the original mssample() call. If the start time is non-zero, the gap time's Tstop != 7 after the first transition.

Seems like we shouldn't be adding tcond for a subject's first transition for a gap-time duration. (We do still need to add tcond in gap time for any of the subject's subsequent transitions, though.)

MWE

fake_CR.csv

# (See file attached to GH issue for demo data.  The dput statement was too long to reasonably paste in.)

rm(list=ls())
library(tidyverse)
library(mstate)
library(pbapply)
library(tibble)
library(testthat)
library(rlang)

# Load data ----
dat <- read.csv("fake_CR.csv")  # (see file attached to GH issue)

# Estm model ----
mod <- coxph(Surv(t,status) ~ x1_1 + x2_1 + x1_2 + x2_2 + x1_3 + x2_3 + 
                              strata(trans), data=dat, timefix=FALSE)

# Set mstate details ----
tmat <- trans.comprisk(3)
to.trans2(tmat)
covPrf <-tribble(
    ~x1_1,  ~x1_2,  ~x1_3,  ~x2_1,   ~x2_2,  ~x2_3, ~trans, ~strata,
    0.1,    0,      0,      -0.1,   0,      0,      1,      1,
    0,      0.1,    0,      0,      -0.1,   0,      2,      2,
    0,      0,      0.1,    0,      0,      -0.1,   3,      3)
  attr(covPrf, "trans") <- tmat
  class(covPrf) <- c("msdata", "data.frame")

# msfit ---- 
msf.dat <- msfit(mod, covPrf, trans=tmat) 

# mssample ----
## Setup details ====
stime <- 1
incrm <- 1
tvec <- seq(1,10, by=incrm)
nReps <- 10     # do several draws, to rule out extreme flukes

mssampStmt <- expr(mssample(
                    Haz = msf.dat$Haz,
                    trans = tmat,
                    history = list(state = 1, time = stime),
                    clock = "forward",
                    M = 1,           
                    output = "state",
                    tvec = tvec
              ))
mutSimStmt <- expr(ceiling(row_number()/((max(tvec)-min(tvec)) / incrm + 1)))

## Run mssamp ====
### Clock ####
set.seed(23)
ms.dat <-
    do.call(rbind,
            pbreplicate(
                nReps,
                eval(mssampStmt),
                simplify = FALSE
            )) %>%
              mutate(simID=!!mutSimStmt) %>%
              relocate(simID)

### Gap ####
set.seed(23)
ms.dat.gap <-
    do.call(rbind,
            pbreplicate(
                nReps,
                call_modify(mssampStmt, clock="reset") %>% eval,
                simplify = FALSE
            )) %>%
              mutate(simID=!!mutSimStmt) %>%
              relocate(simID)

# Examine: Compare the mssample output ----
## View ====
cbind(ms.dat, ms.dat.gap %>% rename_with(~paste0(.x,"_gap"))) %>% View

## Compare ====
### not what you'd expect - should be equal
expect_equal(ms.dat, ms.dat.gap)

### however, if you account for the non-zero start time:   
expect_equal(
    ms.dat %>%             
      filter(time<=max(tvec)-stime),
    ms.dat.gap %>% 
      mutate(time = time - stime) %>%
      filter(time>=stime)
)
MetzgerSK commented 1 year ago

Wanted to acknowledge: hadn't realized this was similar to #8 until now. Will still leave this issue open, just in case #8's problem has a different cause than the problem I describe here.