greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.25k stars 271 forks source link

Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records #63

Closed agitter closed 7 years ago

agitter commented 8 years ago

http://doi.org/10.1038/srep26094

cgreene commented 8 years ago

Cross referencing to #25 . Both should be discussed together as they are essentially concurrent applications. These do seem relevant to the topic. @brettbj willing to share thoughts on this? I know you have read them in depth. Might want to see if Joel or someone from his lab would contribute thoughts on #25. Clearly both groups are thinking about this.

XieConnect commented 7 years ago

@brettbj If Brett is busy with other tasks, I may help present a brief summary of this paper, and maybe also contrast several other EHR phenotyping and patient subtype works. Mine will be much shorter though, and restrict to their relevance to the survey.

cgreene commented 7 years ago

@XieConnect - sounds great! @brettbj : what do you think?

brettbj commented 7 years ago

This slipped down the todolist, @XieConnect go for it, I'll add any thoughts I have to your summary

XieConnect commented 7 years ago

TL;DR: 3-layer stack of denoising autoencoders were applied on EHR data to learn patient representations. And random forest classifier was trained for disease prediction on holdout dataset.

Goal:

Use unsupervised deep learning to learn better patient representations. This will the help improve subsequent tasks, such as disease prediction (classification) using standard machine learning models.

Method:

3-layer stack of denoising autoencoders were used in the initial unsupervised training stage. Random forest classifier was subsequently used in the last step of disease prediction.

Hyperparameter tuning for autoencoders was performed using a separate validation dataset (tuned by looking at the supervised classification performance).

Hyperparameters:

3 layers stack of denoising autoencoders (more layers have been reported, with no obvious improvement). 500 hidden units per layer. 5% noise corruption factor (used zero-out masking noise). Sigmoid activation function. 100 random forest trees.

Data:

EHR data from Mount Sinai. Filtered by thresholding on ICD9 codes (frequencies). Patient features include demographics, structured clinical descriptors (ICD9), medication, procedures, lab tests, and text mining (NLP) features (mainly topics from topic modeling). Patient disease labels were obtained by simple disease categorization from ICD9 codes, representing 78 diseases after filtering.

Data in Year 2014 were retained as testing, and data before that as training.

Final training data: 704,587 patients, and 41,072 features.

Experiments

Data description:

Training: 704,587 patients (unsupervised training). Validation: 5,000 patients (supervised evaluation). Testing: 76,214 patients (supervised evaluation).

Baselines:

For unsupervised representation learning, compared against several common unsupervised approaches: Raw features, PCA, Gaussian mixture model, k-means, ICA.

Performance was judged by the final disease prediction/classification from previous unsupervised features learnt. Metrics include: area under the ROC curve (AUC), accuracy, F-score.

DeepPatient consistently outperformed alternatives.

Strength:

Presentation is easy to follow. It has many experiments and relatively large dataset (and a variety of diseases).

Weakness:

It seems to lack a direct experimental comparison with another earlier work (T. Lasko, PLoS ONE, 2013), whose data and method are similar.

Patient disease labels were derived by abstracting ICD9 codes, accuracy of which is a known issue from a medical diagnosis perspective. Expert chart review is needed to validate the clinical relevance of the labels.

cgreene commented 7 years ago

Discussed in #167

ghost commented 7 years ago

LOL I came across this while Googling about the paper, I think just like all Machine Learning for Healthcare papers its a poorly conducted study. The reported results are compared with PCA, Raw features and other non-deep approaches. While they report Deep Patient outperforming PCA in prediction of all but one diagnoses, its not apparent the extent to which PCA or other competing approaches were tuned. It's very likely that some diagnoses require more than the first 100 principal components or a better tuned (more estimator) random forest classifier. Finally they claim to have selected "patients having at least one new ICD-9 diagnosis assigned". However when assigning labels in disease classification task, the labels are groups of ICD-9 codes. Its likely that the patient had code in the group in past visits. E.g. Diabetes is represented by several codes.