hputter / mstate

https://hputter.github.io/mstate/
7 stars 5 forks source link

Issue with LMAJ #5

Closed hputter closed 2 years ago

hputter commented 3 years ago

Dear Prof. Putter,

First of all, many thanks for the wonderful mstate package and all the effort you have put into it, it’s a delight to use. I do not know how actively you have time to maintain the package but I suspect there may be an issue with msfit() which I thought might be of your interest. I think it has something to do with how msfit() internally indexes transitions/strata and problem manifests when LMAJ() landmarks the data and then calls msfit() with reduced msdata that may not have observations for all transitions. What actually seems to happen is that msfit() recodes transitions incorrectly in this situation.

The problem simplified is following: If I try to estimate transition probabilities with LMAJ() starting from some non-initial state of a model (say, illness-death model and I want to start from the illness state), I get weird results (strictly zero probabilities for death). I tried to debug what happens and If I now understand the code correctly, it seems that landmarked data (with some transitions/strata missing) may not work well with msfit() that apparently assumes under the hood that all transitions are represented in the input data. I guess msfit() uses some internal indexing for transitions/strata and in this situation it seems to recode transitions in the data as a sequence starting from 1. => Thus, if my model and data has four transitions and landmarking leaves only transitions 3 & 4 -> they get codes 1 & 2 in the msfit() output. This gives weird estimates downstream in the LMAJ() output and can be seen if I plot.msfit() the cumulative hazard estimates in the middle of LMAJ() call (legend does not match with the actual plot).

I'm not a statistician by background so this may of course be something I'm just doing incorrectly. However, in my case, I seem to get empirically reasonable estimates if I manually correct/recode transition codes back to original just after LMAJ has called msfit() and before it calls probtrans(), but this is not a very sustainable approach. Also standard Markovian approach (coxph->msfit->probtrans) does not have any issues with the same data and transition matrix.

Best,

Just a quick addition, I may have found a quick way around / temporary fix. Two lines (104 and 168 of v0.2.10) in msfit.R having: sf0 <- ([…], trans=as.numeric(sf0$strata))

could be substituted with something like: sf0 <- ([…], trans=split_strata_toint(sf0$strata))

where: split_strata_toint <- function(strata_factor) sapply(strsplit(as.character(strata_factor), "="), function(x) as.integer(x[2]))

.. is a helper function that extracts the integer from the actual factor level string, instead of naïve numeric conversion.

hputter commented 3 years ago

Decided not to tackle this for mstate 0.3.1; Paavo to send small data set and code to illustrate the problem.

hputter commented 3 years ago

Further e-mail by Paavo:

Dear Hein,

I finally found some time to work with this and hopefully managed to create a simple script to simulate some data / replicate the issue in an overly simplified scenario. Please see R code attached.

The core of the issue seems to be following: When I start from the state 2, LMAJ() drops out transitions 1 and 2 in landmarking. Then it creates a coxph object left only with “trans=3” and “trans=4” and calls msfit(). The issue here is that msfit() extracts this factor of transitions using summary(survfit(object))$strata and casts it into an integer using as.numeric(). That conversion starts indexing from 1, when in fact it should return 3 and 4 instead to match the original transition numbers.

My naïve fix to this is to extract original transition numbers from the factor labels instead of calling as.numeric(). This requires basically just two changes to msfit() as I mentioned in an earlier email:

Just a quick addition, I may have found a quick way around / temporary fix. Two lines (104 and 168 of v0.2.10) in msfit.R having: sf0 <- ([…], trans=as.numeric(sf0$strata))

could be substituted with something like: sf0 <- ([…], trans=split_strata_toint(sf0$strata))

where: split_strata_toint <- function(strata_factor) sapply(strsplit(as.character(strata_factor), "="), function(x) as.integer(x[2]))

Please let me know if you cannot get the script working or if I have misunderstood something completely.

Thanks,

edbonneville commented 2 years ago

This has normally been fixed with version 0.3.2 - if anyone runs into this again, please open the issue again!

edbonneville commented 2 years ago

Revisited with 7bb7aa3 such that it does not modify msfit() directly