m-kno / mimic3

1 stars 0 forks source link

Prediction of Extubation Success in Patients with Acute Respiratory Syndrome

Emergency_Pixabay

The authors: Mirko Knoche, Jacqueline Gabriel, Niko Stergioulas and Nina Notman

This project is still in progress.

Overview

This project is based on the MIMIC-III, critical care database. The MIMIC-III is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework. The following nature article gives further information on data selection and description.

The work in this repository is part of the final assessment of the Data Science Bootcamp at Neue Fische - School and Pool for Digital Talent. Here, we aim to predict the success of an extubation attempt on intesive care patients suffering from acute respiratory syndrome. Our study design leans on a study by Mikhno and colleagues that have predicted the extubation failure for neonates with respirator distress syndrome using the MIMIC-II Clinical Database.

Business Case: predicting the success likelihood of an extubation may decrease the incidence of failed extubations in Intensive Care Units (ICU). This may significantly contribute to patients health as failed extubations are associated with an increased risk of following unplanned extubations, the use of noninvasive ventilation postextubation, and sepsis (Lee et al., 2017).

The project is intended to cover all stages of the data science cycle:

Life cycle of data science

Tools and Technologies used

Database Management with SQL: PostgreSQL / DBeaver
Analysis with Python: Pandas / NumPy / scikit-learn / Matplotlib / Seaborn

ML models used

The models were applied and compared for F0.5-score

Content

Jupyter Notebook 1 (OverviewData): Data Exploration and merging of tables
Jupyter Notebook 2 (SelectingPatientGroup): Selecting patient group and defining label (extubation outcome)
Jupyter Notebook 3 (EDA): Data cleaning
Jupyter Notebook 4 (DataPreparation): Checking for correlations between the features
Jupyter Notebook 5 (Modeling): Train-Test-Split, Scaling, dummy classifier, base models, as well as random search, grid search and feature importance for each base model
Jupyter Notebook 6 (FeatureEngineering): Engineering of new features
Jupyter Notebook 7 (AdvancedModels): Train-Test-Split, Scaling, dummy classifier, advanced models, as well as random search, grid search and feature importance for each advanced model with added features
Jupyter Notebook 8 (DroppingTracheoColumn): repeating the steps of notebook 7 but dropping the column "tracheo" before running the model.
Jupyter Notebook 9 (DroppingTracheoPatients): repeating the steps of notebook 7 but dropping all patients that have been tracheomized.
SQL-Queries (Query Archiv)
Images
Presentation Presentation of the capstone project

Conclusion

We mainly focused on the evaluation metric F-Beta(0.5), to ensure als little false positives as possible. Feature importance is the first attempt to tackle interpretability. If there would be a higher demand on interpretability, we would rather prefer transparent classifiers, such as Decision Tree, Logistic Regression or linear SVM. At the end 75% of positive identification are actually correct. The accuracy of 0.75 tell us that more than 75% of all predictions are correct. Compared to the success rate from the database (Precison: 66%, Accuracy: 55%), the goal of improving the success of extubations has been achieved.

The important features of the best XGBoost Model:

Future work

Further Feature Engineering

Modelling