matthieukomorowski / AI_Clinician

Reinforcement learning for medical decisions
99 stars 47 forks source link

Problem Reproducing Cohort #8

Open wjxgeorge opened 5 years ago

wjxgeorge commented 5 years ago

I'm currently working on a python version data preprocessing code and I'm actually having problem reproducing the cohort as indicated by patientIDs_MIMIC3.csv.

Some hadmid corresponding to icustayid in patientIDs_MIMIC3.csv actually are not in abx.csv file in the first place. For example, icustayid 55 corresponding to hadmid 147080, which won't be returned even I directly query physionet's mimic-iii database.

Anyone can reproduce it using MATLAB code?

paulrich1234 commented 5 years ago

hi Nephalen ,i think here is a clear structure instructions about mimic database https://github.com/alistairewj/sepsis3-mimic

wjxgeorge commented 5 years ago

hi Nephalen ,i think here is a clear structure instructions about mimic database https://github.com/alistairewj/sepsis3-mimic

I'm talking about the code in this repository. I've verified mimic-iii installation.

shengpu-tang commented 5 years ago

Hi Nephalen, I am experiencing the same issue as you do using the provided MATLAB code. For example, ICU stay IDs 200035 and 299994 do not have any antibiotic prescriptions in the database (thus do not meet the sepsis criteria), however they are included in patientIDs_MIMIC3.csv.

ZhiliangWu commented 5 years ago

Hi, @Nephalen and @shengpu1126, I also have the similar issues as reported by you.

The possible reason for no common ids could be the translation of 200,000 added to the published IDs. The largest subject_id for patients in MIMIC's icustays table is 99999, which is smaller than the translation 200,000.

In short, the PatientID refers to icustayids. And running the provided code on the current MIMIC-III database results in a slightly different selection in my experience. Please correct me in case I did something wrong.