Open sumeromer opened 1 year ago
Here is a quick update regarding the issue:
Later, I checked the paper and noticed the cited reference of MIMIC-IV-ED is for version 1.0. I was using the recent version of the data. I set it back to v1.0, and preprocessing worked (lots of pandas warnings, but without issue, it created all files under interim and processed needed for training).
Then, I kept the same configurations, and without changing anything, I trained the CAMELOT model by running python src/training/run_model.py
. It produces a bunch of training and validation logs (definitely learning and optimizing the loss terms and AU-ROC is around the reported values, 0.711 and 0.718 in two different runs).
The problem is confusion matrix printed at the end of training is very bad. The model can predict only Ward but not other classes. Below, I share the output at the end of two runs. It is not possible to have good AU-ROC scores but all false predictions. Confusion matrices below are all false in the Death, ICU, and Discharge classes. I think there is an issue with one of these metrics.
Scoring information for this experiment
ROC-AUC value: [0.69488729 0.75383878 0.72695957 0.67002434]
F1 value: [0. 0.02402402 0.86579287 0. ]
Recall value: [0. 0.01226994 0.99787234 0. ]
Precision value: [0. 0.57142857 0.76459081 0. ]
ARI value: 0.011248643132512373
NMI value: 0.006368103818763199
Silhouette value: 0.2257734007305569
DBI value: 1.686989537706597
VRI value: 24.53473699159825
Purity value: 0.08489307223484438
Confusion Matrix for predicting results
Predicted Class De I W Di
True Class
De 0 1 19 0
I 0 8 644 0
W 0 5 2345 0
Di 0 0 59 0
---
Scoring information for this experiment
ROC-AUC value: [0.73703038 0.68693313 0.65729866 0.5604774 ]
F1 value: [0. 0.01213961 0.86541298 0. ]
Recall value: [0. 0.00613497 0.9987234 0. ]
Precision value: [0. 0.57142857 0.76350033 0. ]
ARI value: 0.0043416573852426464
NMI value: 0.0022140455605365052
Confusion Matrix for predicting results
Predicted Class De I W Di
True Class
De 0 0 20 0
I 0 4 648 0
W 0 3 2347 0
Di 0 0 59 0
---
Scoring information for this experiment
ROC-AUC value: [0.74459327 0.73911226 0.70786972 0.5903235 ]
F1 value: [0. 0.06451613 0.86761711 0. ]
Recall value: [0. 0.03374233 0.99702128 0. ]
Precision value: [0. 0.73333333 0.76794494 0. ]
ARI value: 0.030210147661374066
NMI value: 0.01947228477908028
Silhouette value: 0.20218332008355194
DBI value: 5.401939838385961
VRI value: 41.10279826766778
Purity value: 0.08528976883407263
Confusion Matrix for predicting results
Predicted Class De I W Di
True Class
De 0 1 19 0
I 0 22 630 0
W 0 7 2343 0
Di 0 0 59 0
P.S.: I changed seed numbers in both data_config and training_config; it all gave similar outcomes that I shared above.
Hi @sumeromer, thanks for raising this. This was also raised internally, and I will come back to fix the errors within the next few weeks once I have some time. In the meantime, some comments:
Hi @hrna-ox, thank you very much for your swift reply.
Please do not spend time on the preprocessing issue because everything works in preprocessing in version 1.0, as you used in the paper. I can help to debug your preprocessing script and make it run for the latest MIMIC.
Can you have a look at the performance metrics and the results? Because it never predicts accurately in death and discharge, nearly all predictions are ward. Thus, it is more urgent to understand whether there is a flaw in the evaluation of your approach or not.
I read your paper in detail and want to reproduce your results. However, the requirements file is incomplete, and the readme file also lacks a clear description of which files and how to run them. Can you clarify and help me with the issue below?
Considering that I use the original dataset (that has "entrance date/time" before exit), those assertions need to be passed, but they fail.
I put MIMIV-IV-ED v2.2. under /data/MIMIC and also copied
patients.csv
andtransfers.csv
files from the MIMIC-IV v2.2 under another folder named/data/MIMIC/core
I run the following script to create a processed version of the dataset according to your scheme.
However, this script gives the following error:
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy admissions_ed_S3.loc[:, PATIENT_INFO] = patients_core.set_index("subject_id").loc[patients_S3, PATIENT_INFO].values /home/omersumer/Desktop/ts-ehr/external/camelot-icml/src/data_processing/MIMIC/admissions_processing.py:130: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy admissions_edS3.loc[:, "next" + col] = transfers_to_relevant_wards.set_index("subject_id").loc[ /home/omersumer/Desktop/ts-ehr/external/camelot-icml/src/data_processing/MIMIC/admissions_processing.py:130: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy admissions_edS3.loc[:, "next" + col] = transfers_to_relevant_wards.set_index("subject_id").loc[ /home/omersumer/Desktop/ts-ehr/external/camelot-icml/src/data_processing/MIMIC/admissions_processing.py:130: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy admissions_edS3.loc[:, "next" + col] = transfers_to_relevant_wards.set_index("subject_id").loc[ /home/omersumer/Desktop/ts-ehr/external/camelot-icml/src/data_processing/MIMIC/admissions_processing.py:130: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy admissions_edS3.loc[:, "next" + col] = transfers_to_relevant_wards.set_index("subject_id").loc[ /home/omersumer/Desktop/ts-ehr/external/camelot-icml/src/data_processing/MIMIC/admissions_processing.py:130: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy admissions_edS3.loc[:, "next" + col] = transfers_to_relevant_wards.set_index("subject_id").loc[ /home/omersumer/Desktop/ts-ehr/external/camelot-icml/src/data_processing/MIMIC/admissions_processing.py:134: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy admissions_ed_S3["age"] = admissions_ed_S3.intime.dt.year - admissions_ed_S3["anchor_year"] + admissions_ed_S3[ Traceback (most recent call last): File "src/data_processing/MIMIC/run_processing.py", line 37, in
main()
File "src/data_processing/MIMIC/run_processing.py", line 26, in main
vitals_processing.main()
File "/home/omersumer/Desktop/ts-ehr/external/camelot-icml/src/data_processing/MIMIC/vitals_processing.py", line 79, in main
test.admissions_processed_correctly(admissions)
File "/home/omersumer/Desktop/ts-ehr/external/camelot-icml/src/data_processing/MIMIC/test.py", line 75, in admissions_processed_correctly
assert test_entrance_before_exit(df["intime"], df["outtime"])
AssertionError