microsoft / med-deadend

Code for the Medical Deadend Paper at NeurIPS 2021
MIT License
47 stars 18 forks source link

Processed Sepsis Cohort #2

Closed yikuan8 closed 2 years ago

yikuan8 commented 2 years ago

Thank you for sharing this great repo! https://github.com/microsoft/mimic_sepsis generates two files: sepsis_final_data_withTimes.csv sepsis_final_data_RAW_withTimes.csv

How can I derive the files ending with K1 that are required for step 0?

Thanks,

twkillian commented 2 years ago

Ah man, that's an error on our part. We've mixed up how we've historically tracked the names of the files 😵 .

Thanks for bringing this up. The *.csv files you've generated are correct. I'll update our README accordingly.

yikuan8 commented 2 years ago

Thank you so much for the clarification. One more follow-up question from an RL newbie. The KNN part are used to impute the missing of each timestamp. You did not impute the observations of completely missing timestamps, am I correct? Coz, I saw the maximum steps for trajectories ranging from [1,20].

twkillian commented 2 years ago

You're correct. Each patient trajectory is extracted based on a presumed onset of sepsis. All observations that occur at most 24 hours before and up to 48 hours after this onset are included. Many times the patient is either discharged (recovered from sepsis) or dies before the full 72 hours of the extracted trajectory are fulfilled. It wouldn't make sense to fill in timestamps after this terminal condition occurs.

To handle the unequal lengths of trajectories, we can zero pad the ends of the trajectories after the terminal condition when dealing with them in their entirety (e.g. in a batch setting for recurrent models). An example of how this is done with this data can be found at https://github.com/MLforHealth/rl_representations/blob/main/scripts/split_sepsis_cohort.py

yikuan8 commented 2 years ago

Ohhh, that is really helpful!!! Did you apply the zero padding in your implementation? I found a few trajectories having 20 steps. However, the 72 hours period with an interval of 4hours will have a cap of 19 steps. May I ask why this happens, or there are some extra processing steps?

dmasamba commented 2 months ago

Hello, I know the issue is already closed and thank you so much for updating the README, but the script data_process.py still has the following line:

(line 10) data_file = r"./data/sepsis_mimiciii/sepsis_final_data_K1.csv" instead of using sepsis_final_data_withTimes.csv

That's why I ended up here lol but from the responses of this issue I now understand what to do, just wanted to bring it up in case someone else also encounters this.

dmasamba commented 2 months ago

Never mind lol It's probably better to rename both data "sepsis_final_data_withTimes.csv" and "sepsis_final_data_RAW_withTimes.csv" to "sepsis_final_data_K1.csv" and "sepsis_final_data_K1_RAW.csv" because it looks like other scripts (probably more than 2) are still using the data naming with K1.