hrna-ox / camelot-icml

Code for submission XXXX
MIT License
10 stars 5 forks source link

Data processing issue (test_entrance_before_exit) and reproducibility (confusion matrices) #2

Open sumeromer opened 1 year ago

sumeromer commented 1 year ago

I read your paper in detail and want to reproduce your results. However, the requirements file is incomplete, and the readme file also lacks a clear description of which files and how to run them. Can you clarify and help me with the issue below?

Considering that I use the original dataset (that has "entrance date/time" before exit), those assertions need to be passed, but they fail.

  1. I put MIMIV-IV-ED v2.2. under /data/MIMIC and also copied patients.csv and transfers.csv files from the MIMIC-IV v2.2 under another folder named /data/MIMIC/core

  2. I run the following script to create a processed version of the dataset according to your scheme.

    python src/data_processing/MIMIC/run_processing.py

    However, this script gives the following error:

    
    Current Directory:  /home/omersumer/Desktop/ts-ehr/external/camelot-icml
    100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 205504/205504 [03:25<00:00, 999.12it/s]
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 205504/205504 [03:23<00:00, 1010.21it/s]
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 58.23it/s]
    100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 299712/299712 [05:40<00:00, 879.33it/s]
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 86.70it/s]
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 47.20it/s]
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 112745/112745 [01:27<00:00, 1290.79it/s]
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 251.50it/s]
    /home/omersumer/Desktop/ts-ehr/external/camelot-icml/src/data_processing/MIMIC/admissions_processing.py:127: SettingWithCopyWarning: 
    A value is trying to be set on a copy of a slice from a DataFrame.
    Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy admissions_ed_S3.loc[:, PATIENT_INFO] = patients_core.set_index("subject_id").loc[patients_S3, PATIENT_INFO].values /home/omersumer/Desktop/ts-ehr/external/camelot-icml/src/data_processing/MIMIC/admissions_processing.py:130: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy admissions_edS3.loc[:, "next" + col] = transfers_to_relevant_wards.set_index("subject_id").loc[ /home/omersumer/Desktop/ts-ehr/external/camelot-icml/src/data_processing/MIMIC/admissions_processing.py:130: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy admissions_edS3.loc[:, "next" + col] = transfers_to_relevant_wards.set_index("subject_id").loc[ /home/omersumer/Desktop/ts-ehr/external/camelot-icml/src/data_processing/MIMIC/admissions_processing.py:130: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy admissions_edS3.loc[:, "next" + col] = transfers_to_relevant_wards.set_index("subject_id").loc[ /home/omersumer/Desktop/ts-ehr/external/camelot-icml/src/data_processing/MIMIC/admissions_processing.py:130: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy admissions_edS3.loc[:, "next" + col] = transfers_to_relevant_wards.set_index("subject_id").loc[ /home/omersumer/Desktop/ts-ehr/external/camelot-icml/src/data_processing/MIMIC/admissions_processing.py:130: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy admissions_edS3.loc[:, "next" + col] = transfers_to_relevant_wards.set_index("subject_id").loc[ /home/omersumer/Desktop/ts-ehr/external/camelot-icml/src/data_processing/MIMIC/admissions_processing.py:134: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy admissions_ed_S3["age"] = admissions_ed_S3.intime.dt.year - admissions_ed_S3["anchor_year"] + admissions_ed_S3[ Traceback (most recent call last): File "src/data_processing/MIMIC/run_processing.py", line 37, in main() File "src/data_processing/MIMIC/run_processing.py", line 26, in main vitals_processing.main() File "/home/omersumer/Desktop/ts-ehr/external/camelot-icml/src/data_processing/MIMIC/vitals_processing.py", line 79, in main test.admissions_processed_correctly(admissions) File "/home/omersumer/Desktop/ts-ehr/external/camelot-icml/src/data_processing/MIMIC/test.py", line 75, in admissions_processed_correctly assert test_entrance_before_exit(df["intime"], df["outtime"]) AssertionError

sumeromer commented 1 year ago

Here is a quick update regarding the issue:

Scoring information for this experiment

ROC-AUC value: [0.69488729 0.75383878 0.72695957 0.67002434]
F1 value: [0.         0.02402402 0.86579287 0.        ]
Recall value: [0.         0.01226994 0.99787234 0.        ]
Precision value: [0.         0.57142857 0.76459081 0.        ]
ARI value: 0.011248643132512373
NMI value: 0.006368103818763199
Silhouette value: 0.2257734007305569
DBI value: 1.686989537706597
VRI value: 24.53473699159825
Purity value: 0.08489307223484438

Confusion Matrix for predicting results
Predicted Class  De  I     W  Di
True Class                      
De                0  1    19   0
I                 0  8   644   0
W                 0  5  2345   0
Di                0  0    59   0
---

Scoring information for this experiment

ROC-AUC value: [0.73703038 0.68693313 0.65729866 0.5604774 ]
F1 value: [0.         0.01213961 0.86541298 0.        ]
Recall value: [0.         0.00613497 0.9987234  0.        ]
Precision value: [0.         0.57142857 0.76350033 0.        ]
ARI value: 0.0043416573852426464
NMI value: 0.0022140455605365052

Confusion Matrix for predicting results
Predicted Class  De  I     W  Di
True Class                      
De                0  0    20   0
I                 0  4   648   0
W                 0  3  2347   0
Di                0  0    59   0
---

Scoring information for this experiment

ROC-AUC value: [0.74459327 0.73911226 0.70786972 0.5903235 ]
F1 value: [0.         0.06451613 0.86761711 0.        ]
Recall value: [0.         0.03374233 0.99702128 0.        ]
Precision value: [0.         0.73333333 0.76794494 0.        ]
ARI value: 0.030210147661374066
NMI value: 0.01947228477908028
Silhouette value: 0.20218332008355194
DBI value: 5.401939838385961
VRI value: 41.10279826766778
Purity value: 0.08528976883407263

Confusion Matrix for predicting results
Predicted Class  De   I     W  Di
True Class                       
De                0   1    19   0
I                 0  22   630   0
W                 0   7  2343   0
Di                0   0    59   0

P.S.: I changed seed numbers in both data_config and training_config; it all gave similar outcomes that I shared above.

hrna-ox commented 1 year ago

Hi @sumeromer, thanks for raising this. This was also raised internally, and I will come back to fix the errors within the next few weeks once I have some time. In the meantime, some comments:

sumeromer commented 1 year ago

Hi @hrna-ox, thank you very much for your swift reply.

Please do not spend time on the preprocessing issue because everything works in preprocessing in version 1.0, as you used in the paper. I can help to debug your preprocessing script and make it run for the latest MIMIC.

Can you have a look at the performance metrics and the results? Because it never predicts accurately in death and discharge, nearly all predictions are ward. Thus, it is more urgent to understand whether there is a flaw in the evaluation of your approach or not.