skent259 / rsmatch

Matching Methods for Time-Varying Observational Studies, in R
https://skent259.github.io/rsmatch/
Other
2 stars 4 forks source link

Constant trt-status in .coxps_match()? #23

Open eribul opened 2 months ago

eribul commented 2 months ago

Thank you very much for providing this package! I am new to the field and might have missunderstod something but when I read the sourcecode for .coxps_match() it strikes me on line 152 that you have:

 data$trt <- ifelse(is.na(data$trt_time), 0, 1)

Hence, that an individual who gets treated always receives that status, either if the treatment occours in the specific period or not (in the long data format). Hence, a treated individual will be considered treated already before the treatment was performed. I tried (debugged) with the example data set which seems to confirm this. Then, the data used in Cox regression on line 163:

model <- survival::coxph(form, data = data.cox)

seems to lead to a biased result?

Since this is not my area of expertise, I did have a chat with ChatGPT about the issue, and for what it's worth I provide its suggested response to a GitHub issue below (take it or leave it) 😀.

Thanks again for providing this package (personally I use your functions as inspiration, although not the package itself, since I needed to switch from data.frame to data.table syntax in order to work with big data).


Hello,

I've been working with time-dependent data using your package and noticed that after transforming the dataset into a long format, the event status variable (trt) remains constant for each individual across all time intervals. This approach seems problematic for correctly implementing a time-dependent Cox model, as it doesn't accurately reflect the time at which the event (e.g., receiving a fourth vaccine dose) actually occurs.

Problem Description In a typical time-dependent Cox model, the event variable should indicate whether the event of interest (e.g., receiving a treatment or dose) has occurred within a specific time interval. This allows the model to appropriately account for the risk and timing of the event. If the event status is constant across all intervals for an individual, the model cannot distinguish between pre- and post-event periods, leading to potential misestimation of hazard ratios and an incorrect understanding of the time-dependent nature of the covariates.

Potential Consequences Incorrect Hazard Estimation: The Cox model may produce biased estimates of the hazard ratios because it assumes the event either always occurs or never occurs, rather than reflecting the actual timing of the event.

Loss of Temporal Information: Without updating the event status for each interval, the model fails to capture the dynamic relationship between covariates and the outcome over time, which is crucial for time-dependent analyses.

Mismatch with Theoretical Foundations: The theoretical foundation of time-dependent propensity score matching, as described in the literature (e.g., Lu, 2005), emphasizes the importance of updating the propensity score and event status at each time point to reflect the evolving risk of the event. A constant event variable undermines this principle and may lead to incorrect matching and analysis.

Suggested Solution The event status variable should be updated for each time interval to correctly indicate whether the event occurred during that specific period. This would ensure that the Cox model and subsequent analyses accurately reflect the time-dependent nature of the data.

Thank you for your attention to this issue. I believe addressing this will significantly improve the accuracy and applicability of time-dependent analyses using your package.

Best regards, [Your Name]

eribul commented 2 months ago

On a related not, concerning the Cox-call above. Shouldn't it have argument id = iid for robust error estimates to compensate for the repeated measures?

eribul commented 2 months ago

Ah, sorry, I guess the last comment might not apply since the goal is prediction and not infrence at this stage. Appologies!

skent259 commented 2 months ago

The variables id, trt, and trt_time are all patient-level data (not time level-data), even in the long format.

So the logic is that if treatment time is NA, then the subject was not yet treated, and their trt variable is 0. This keeps them in the risk set for matching.

Does that help to clarify?

Regarding your second question, yes, we only need the prediction to get the appropriate matches.

skent259 commented 2 months ago

As a side note, I am curious about your use case with data.table. This seems like something we could address in the future. Maybe you can open a separate issue on this for tracking?