lrsoenksen / HAIM

This repository contains the code to replicate the data processing, modeling and reporting of our Holistic AI in Medicine (HAIM) Publication in Nature Machine Intelligence (Soenksen LR, Ma Y, Zeng C et al. 2022).
Apache License 2.0
104 stars 27 forks source link

Input window for time-series data #4

Closed ChantalMP closed 1 year ago

ChantalMP commented 1 year ago

Hi,

Thanks for your exciting work!

I was wondering if all data throughout the patient's stay is used to form the patient embedding.

Especially for Mortality and Discharge prediction, the paper mentions the labels are defined relative to patient admission. Does this mean no time-series data is used as it does not yet exist for the patient? Or is the entire time-series data used? If the complete data is used, wouldn't the length of the time-series records alone have a strong correlation to the final output label?

Thanks a lot in advance,

Chantal

lrsoenksen commented 1 year ago

Hi Chantal,

Thanks for the kind words. All the data before the prediction time is used from each patient embedding generation. This includes the time series. If you see how we process time series, we end up using time series statistics as the time series features; this means the time series trends for all signals are more important than their length per-se. All that said, definitely, time series length will correlate with mortality as it is more likely that a complicated patient that stays for a long time in the hospital dies than one who has just been admitted. We did not conduct sensitivity analyses on time series length and mortality class, but I would suspect that length of stay and mortality are correlated and for a good reason. Decoupling them could actually not be advantageous if you think about it carefully. Thanks for your thoughtful note and hope this helps.

ChantalMP commented 1 year ago

Thanks for you quick and helpful reply. Just one clarification question: What do you define as the prediction time? So e.g. for 48h-discharge prediction, is the prediction time at the beginning or end of these 48 hours?

My thought was less about understanding a patient is complicated due to long stay time, that makes total sense. Rather, when the prediction time is always the time at admission, patient with a record longer than 48 hours can not be discharged or die within the first 48 hours of their stay, while patients with a record shorter than 48 hours probably either died or were released.

Thanks again!

lrsoenksen commented 1 year ago

We define prediction time at the beginning of these 48 hours. As in, we make the prediction say at t = N, whereas the binary mortality label is obtained just by looking at the patient's state at t = N + 48hrs. I believe for that specific task, if a patient was discharged "alive" less than 48hrs after admission, then we labeled those as alive at 48hrs after the prediction time, unless they had a subsequent new hospitalization within 48hrs from that previous event and died within 48hrs from the admission time of the first hospitalization. Hope this helps.

ChantalMP commented 1 year ago

Thanks a lot :)