mbilos / neural-flows-experiments

Experiments for Neural Flows paper
https://arxiv.org/abs/2110.13040
85 stars 17 forks source link

Some severe issues with the MIMIC-IV preprocessing #7

Open randolf-scholz opened 2 years ago

randolf-scholz commented 2 years ago

I was reproducing the preprocessing and I noticed a few severe issues with the preprocessing provided.

datamerging.ipynb - Prescriptions are accidentally dropped completely

presc_df = presc_df.drop((presc_df['valuenum']=='3-10').index)

afterwards, the table is empty.

outputs.ipynb - Wrong labels!

outputs_label_list contains the entries "Chest Tube" and "Jackson Pratt", but these never appear as labels, the correct labels are "Chest Tube #1" and "Jackson Pratt #1"

prescriptions.ipynb - missing required filtering

inputevents.ipynb

The code for adding repeats does in some cases not add enough repeats due to a rounding issue. This can be tested via

min_diff = (pd.to_datetime(df_new1["endtime"])-df_new1["charttime"]).groupby(level=0).min()
assert all(min_diff < pd.Timedelta("30min")), f"Did not add enough steps!"

labevents.ipynb

admissions.ipynb

We filter for patients with a single admission, however later in the other dataframes hadm_id is used as filter instead of subject_id. The issue is that there appears to be corrupted data in at least one table that gives rise to hadm_id with multiple subject_id associated with it. We can test it in datamerging via

assert all(merged_df.groupby("subject_id")["hadm_id"].nunique() == 1)
assert all(merged_df.groupby("hadm_id")["subject_id"].nunique() == 1)

Further, the hospital stay is limited to patients with 2-29 days stay. However, the charttime does not agree with this data. Sometimes, charttime starts before admittime. The longest charttime is over 52 years.

randolf-scholz commented 2 years ago

A further issue: Standardization is applied column-wise

https://github.com/mbilos/neural-flows-experiments/blob/bd19f7c92461e83521e268c1a235ef845a3dd963/nfe/experiments/gru_ode_bayes/lib/get_data.py#L50-L53

But the Famotidine (Pepcid) and Metronidazole have zero standard deviation!!