Prediction of Extubation Success in Patients with Acute Respiratory Syndrome

Emergency_Pixabay

The authors: Mirko Knoche, Jacqueline Gabriel, Niko Stergioulas and Nina Notman

This project is still in progress.

Overview

This project is based on the MIMIC-III, critical care database. The MIMIC-III is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework. The following nature article gives further information on data selection and description.

The work in this repository is part of the final assessment of the Data Science Bootcamp at Neue Fische - School and Pool for Digital Talent. Here, we aim to predict the success of an extubation attempt on intesive care patients suffering from acute respiratory syndrome. Our study design leans on a study by Mikhno and colleagues that have predicted the extubation failure for neonates with respirator distress syndrome using the MIMIC-II Clinical Database.

Business Case: predicting the success likelihood of an extubation may decrease the incidence of failed extubations in Intensive Care Units (ICU). This may significantly contribute to patients health as failed extubations are associated with an increased risk of following unplanned extubations, the use of noninvasive ventilation postextubation, and sepsis (Lee et al., 2017).

The project is intended to cover all stages of the data science cycle:

Life cycle of data science

1 Medical understanding
2 Data acquisition
3 Data cleansing
4 Data Exploration
5 Feature Engineering
6 Predictive modeling
7 Data visualization

Tools and Technologies used

Database Management with SQL: PostgreSQL / DBeaver
Analysis with Python: Pandas / NumPy / scikit-learn / Matplotlib / Seaborn

ML models used

The models were applied and compared for F0.5-score

Logistic Regression
Decision Tree
Random Forest
XGBoost
Adaboost
Knn
Support Vector Machine

Content

Jupyter Notebook 1 (OverviewData): Data Exploration and merging of tables
Jupyter Notebook 2 (SelectingPatientGroup): Selecting patient group and defining label (extubation outcome)
Jupyter Notebook 3 (EDA): Data cleaning
Jupyter Notebook 4 (DataPreparation): Checking for correlations between the features
Jupyter Notebook 5 (Modeling): Train-Test-Split, Scaling, dummy classifier, base models, as well as random search, grid search and feature importance for each base model
Jupyter Notebook 6 (FeatureEngineering): Engineering of new features
Jupyter Notebook 7 (AdvancedModels): Train-Test-Split, Scaling, dummy classifier, advanced models, as well as random search, grid search and feature importance for each advanced model with added features
Jupyter Notebook 8 (DroppingTracheoColumn): repeating the steps of notebook 7 but dropping the column "tracheo" before running the model.
Jupyter Notebook 9 (DroppingTracheoPatients): repeating the steps of notebook 7 but dropping all patients that have been tracheomized.
SQL-Queries (Query Archiv)
Images
Presentation Presentation of the capstone project

Conclusion

We mainly focused on the evaluation metric F-Beta(0.5), to ensure als little false positives as possible. Feature importance is the first attempt to tackle interpretability. If there would be a higher demand on interpretability, we would rather prefer transparent classifiers, such as Decision Tree, Logistic Regression or linear SVM. At the end 75% of positive identification are actually correct. The accuracy of 0.75 tell us that more than 75% of all predictions are correct. Compared to the success rate from the database (Precison: 66%, Accuracy: 55%), the goal of improving the success of extubations has been achieved.

The important features of the best XGBoost Model:

Is the patient tracheotomized?
SaO2 and PaO2 (min, median)
Respiratory rate
The ratio of arterial partial pressure of oxygen to inspired fractional concentration of oxygen
Maximum of the fraction of inspired oxygen
Age

Future work

Further Feature Engineering

Add BMI and weight change as additional features
Add medications, treatments and their charttime as features
Take fluid input and output into account

Modelling

Check false classified patients
Neuronal networks
Time Series analysis

m-kno / mimic3

readme