hputter / mstate

https://hputter.github.io/mstate/
7 stars 5 forks source link

Probability of state is >0% which is not possible - incorrect transition matrix specification? #29

Closed Jack-FG closed 2 months ago

Jack-FG commented 4 months ago

Hi!

I have the following multi-state model that I am trying to use with mstate.

image

I am able to generate the following

 outcome_states <- c("Starting State","State 1", "State 2", "State 3", "Absorbing 1","Absorbing 2")
rtperform_trans <- transMat(x = list(c(2,3,5,6), c(3,4,5,6), c(2,4,5,6), c(2,3,5,6), c(), c()), names = outcome_states)
  rtperform_trans

image

After generating the msdata object and using the events() function it returns image

This is as expected. It is important to note here that an individual will NEVER be in the 'Starting State' after time 1. This can be seen in the first row of the events() function return where no individuals are censored.

Due to this, the probability of being in the 'Starting State' should be 0 at and then after time 1.

When I build a multi-state model and generate the probabilities of being in states the plot(trans_prob) returns the graph I expect.

# build base multi-state model
ms_base_model <- coxph(Surv(start_time, end_time, status) ~ strata(trans),
                       data = msdf,
                       method = "breslow")

# Calculate msfit using specified transition matrix
ms_fit_base <- msfit(object = ms_base_model,
                     vartype = "greenwood",
                     trans = rtperform_trans)

# Generate transition probabilities
trans_prob <- probtrans(ms_fit_base, predt = 0, method = "greenwood")

# Plot
plot(trans_prob)

Graph expected: from 0-1 time the probability of the starting state is 100% and then from 1 onwards the probability of the starting state is 0. image

The challenge comes when co-variates are introduced.

I introduce 2 covariates and then generate the same probability transition plot, only difference is, now, the starting state has a probability that is non-zero after t=1, when this is impossible. image

Why is it that introducing covariates would cause this behaviour? Or has something else been missed. I can provide a full markdown with every step, but feel I could get closer to the answer quicker with info provided above.

MetzgerSK commented 3 months ago

A quick thing to check: If you print out the Haz element in your returned msfit object for the with-covariates model, for your covariate profile of interest (i.e., the data passed along to msfit's newdata argument), do all the outward transition hazards from your starting stage sum to 1 for $t=1$?

If not, that would explain what you're seeing. Everyone doesn't transition out of your starting stage because, if all subjects hypothetically have the covariate profile you selected, the various $h_q(t)$ values associated with transitions out of the starting stage aren't large enough to guarantee everyone will exit by $t=1$.

The subsequent plateau would also then be easy to explain. All Cox model predicted quantities, including transition probabilities, only change in value at $t$ s where failure events occur for a given failure event (or set of failure events, like exiting a stage) of interest. Since no one else transitions out of your starting stage after $t=1$, the transition probability for exiting that stage is constant for all $t>1$.

edbonneville commented 3 months ago

@Jack-FG thanks for getting in touch! Are you using transition-specific covariates? My guess is your model with covariates assumes common effects for the transitions out of starting state (which you do not care about, they are all out by $t = 1$ anyway), and the rest.. this would then be coherent with @MetzgerSK's explanation above (thanks!). Have a look at the reproducible example below:

# Simulate time-homogenous Markov model, 1 covariate, no censoring
set.seed(4896516)
library(mstate)
#> Warning: package 'mstate' was built under R version 4.3.2
#> Loading required package: survival
#> Warning: package 'survival' was built under R version 4.3.2
n <- 500
X1 <- rnorm(n = n)

# Everyone moves out of state 1 at t = 1, into state 2
# ..then standard competing events from state 2
time_out_of_state1 <- 1
time_state2 <- rep(time_out_of_state1, n) 
time_state3 <- rexp(n, rate = exp(0.5 * X1))
time_state4 <- rexp(n, rate = 0.5 * exp(0.5 * X1))
time_comp <- pmin(time_state3, time_state4)

# State indicators
ind_state2 <- rep(1, n)
ind_state3 <- as.numeric(time_state3 < time_state4)
ind_state4 <- as.numeric(time_state3 >= time_state4)

# Store all in data.frame
dat <- data.frame(
  id = seq_len(n),
  time_state2,
  time_comp = time_out_of_state1 + time_comp,
  ind_state2,
  ind_state3,
  ind_state4,
  X1
)

# Make transition matrix and long data
tmat <- transMat(x = list(2, c(3, 4), c(), c()))
tmat
#>          to
#> from      State 1 State 2 State 3 State 4
#>   State 1      NA       1      NA      NA
#>   State 2      NA      NA       2       3
#>   State 3      NA      NA      NA      NA
#>   State 4      NA      NA      NA      NA
msdat <- msprep(
  time = c(NA, "time_state2", "time_comp", "time_comp"),
  status = c(NA, "ind_state2", "ind_state3", "ind_state4"),
  data = dat,
  id = "id",
  keep = "X1",
  trans = tmat
)

# Checks + expand data
events(msdat)
#> $Frequencies
#>          to
#> from      State 1 State 2 State 3 State 4 no event total entering
#>   State 1       0     500       0       0        0            500
#>   State 2       0       0     356     144        0            500
#>   State 3       0       0       0       0      356            356
#>   State 4       0       0       0       0      144            144
#> 
#> $Proportions
#>          to
#> from      State 1 State 2 State 3 State 4 no event
#>   State 1   0.000   1.000   0.000   0.000    0.000
#>   State 2   0.000   0.000   0.712   0.288    0.000
#>   State 3   0.000   0.000   0.000   0.000    1.000
#>   State 4   0.000   0.000   0.000   0.000    1.000
msdat_expand <- expand.covs(msdat, covs = "X1")

# Non-parametric start
ms_base_model <- coxph(
  Surv(Tstart, Tstop, status) ~ strata(trans),
  data = msdat_expand,
  method = "breslow"
)
ms_fit_base <- msfit(object = ms_base_model, vartype = "greenwood", trans = tmat)
#> Warning in min(diff(time)): no non-missing arguments to min; returning Inf

# Generate transition probabilities
trans_prob <- probtrans(ms_fit_base, predt = 0, method = "greenwood")
plot(trans_prob, xlim = c(0, 3))


# Now with X1 transition-specific
ms_covs_model <- coxph(
  Surv(Tstart, Tstop, status) ~ X1.2 + X1.3 + strata(trans),
  data = msdat_expand,
  method = "breslow"
)

# Example new patient
newdat <- msdat_expand[msdat_expand$id == 266, ]
newdat$strata <- newdat$trans
newdat
#> An object of class 'msdata'
#> 
#> Data:
#>      id from to trans Tstart    Tstop      time status        X1      X1.1
#> 796 266    1  2     1      0 1.000000 1.0000000      1 -2.441694 -2.441694
#> 797 266    2  3     2      1 1.850324 0.8503237      1 -2.441694  0.000000
#> 798 266    2  4     3      1 1.850324 0.8503237      0 -2.441694  0.000000
#>          X1.2      X1.3 strata
#> 796  0.000000  0.000000      1
#> 797 -2.441694  0.000000      2
#> 798  0.000000 -2.441694      3

# Still ok
ms_fit_covs <- msfit(object = ms_covs_model, newdata = newdat, trans = tmat)
trans_prob <- probtrans(ms_fit_covs, predt = 0)
#> Warning in probtrans(ms_fit_covs, predt = 0): Warning! Negative diagonal elements of (I+dA); the estimate may not be meaningful.
plot(trans_prob, xlim = c(0, 3))


# Now the wrong way to go:
ms_covs_model2 <- coxph(
  Surv(Tstart, Tstop, status) ~ X1 + strata(trans),
  data = msdat_expand,
  method = "breslow"
)
ms_fit_covs <- msfit(object = ms_covs_model2, newdata = newdat, trans = tmat)
trans_prob <- probtrans(ms_fit_covs, predt = 0)
plot(trans_prob)

plot(ms_fit_covs) # see @MetzgerSK's explanation!

Created on 2024-06-13 with reprex v2.0.2

edbonneville commented 2 months ago

(Closing since this unlikely to be a issue with the package itself)