hesim-dev / hesim

Health economic simulation modeling and decision analysis
https://hesim-dev.github.io/hesim/
63 stars 17 forks source link

Censoring For Discrete Time Markov Models #107

Open swaheera opened 1 year ago

swaheera commented 1 year ago

Hi Dr. Incerti,

Suppose I have data in R that looks like this:

df <- data.frame(patient_id = c(111,111,111, 111, 222, 222, 222), 
                 year = c(2010, 2011, 2012, 2013, 2011, 2012, 2013), 
                 gender = c("Male", "Male", "Male", "Male", "Female", "Female", "Female"), 
                 weight = c(98, 97, 102, 105, 87, 81, 83), 
                 state_at_year = c("healthy", "sick", "sicker", "sicker", "healthy", "sicker", "sicker"))

  patient_id year gender weight state_at_year
1        111 2010   Male     98       healthy
2        111 2011   Male     97          sick
3        111 2012   Male    102        sicker
4        111 2013   Male    105        sicker
5        222 2011 Female     87       healthy
6        222 2012 Female     81        sicker
7        222 2013 Female     83        sicker

To reformat the data for Discrete Time Markov Cohort Models (https://hesim-dev.github.io/hesim/articles/mlogit.html) - I would have to reformat the data in such a way, such that it represents transitions between states:


  patient_id year_start year_end gender_start gender_end state_start state_end weight_start weight_end
1        111       2010     2011         Male       Male     healthy      sick           98         97
2        111       2011     2012         Male       Male        sick    sicker           97        102
3        111       2012     2013         Male       Male      sicker    sicker          102        105
4        222       2011     2012       Female     Female     healthy    sicker           87         81
5        222       2012     2013       Female     Female      sicker    sicker           81         83

structure(list(patient_id = c(111, 111, 111, 222, 222), year_start = c(2010, 
2011, 2012, 2011, 2012), year_end = c(2011, 2012, 2013, 2012, 
2013), gender_start = c("Male", "Male", "Male", "Female", "Female"
), gender_end = c("Male", "Male", "Male", "Female", "Female"), 
    state_start = c("healthy", "sick", "sicker", "healthy", "sicker"
    ), state_end = c("sick", "sicker", "sicker", "sicker", "sicker"
    ), weight_start = c(98, 97, 102, 87, 81), weight_end = c(97, 
    102, 105, 81, 83)), row.names = c(1L, 2L, 3L, 4L, 5L), class = "data.frame")

It appears as though there is no way to but to eliminate the last row of data for each patient - as this will be the last available transition for that patient. This means, that we will be forced to lose one row of data for each patient.

In cases where the patient experiences an absorbing event (e.g. death) - in these cases, it is not a problem. However, in cases where the patient is "right censored" (i.e. has the event after the end of the study) - there is nothing we can do to account for censoring other than removing the last row of data for each patient.

Is my understanding of this correct?

Thanks!