Question about MIMIC-iii dataset

Hello @zjs123, thanks for your question.

First, according to the patient number 6350, I assume that the MoleRec paper uses this github repo to process the MIMIC-III data https://github.com/ycq091044/SafeDrug.

Second, the drug_recommendation_mimic3_fn in the PyHealth package is a bit different from the data processing script in https://github.com/ycq091044/SafeDrug. The major difference is in https://github.com/sunlabuiuc/PyHealth/blob/master/pyhealth/tasks/drug_recommendation.py#L53.

In the SafeDrug repo, if one visit has either diagnoses or procedures, then it will be included. While in PyHealth, only if the visit has both the diagnoses and procedures, then it will be included (so the requirements here is more strict). There might be other minor difference.

P.S. if you take a look at the diff of patient number, 6350 - 5449 = 901 patients, and the diff of visit number is 14995 - 14141 = 854 visits, it is interesting that patient diff is larger than visit diff (I will assume the other way around). Anyway, it somehow tells us that the missing patients mostly have only one visit, which is not help in learning sequential models.

sunlabuiuc / PyHealth

Question about MIMIC-iii dataset #283