Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models

agitter commented 7 years ago

https://doi.org/10.1101/131367 (http://biorxiv.org/content/early/2017/04/27/131367)

Translating the vast data generated by genomic platforms into accurate predictions of clinical outcomes is a fundamental challenge in genomic medicine. Many prediction methods face limitations in learning from the high-dimensional profiles generated by these platforms, and rely on experts to hand-select a small number of features for training prediction models. In this paper, we demonstrate how deep learning and Bayesian optimization methods that have been remarkably successful in general high-dimensional prediction tasks can be adapted to the problem of predicting cancer outcomes. We perform an extensive comparison of Bayesian optimized deep survival models and other state of the art machine learning methods for survival analysis, and describe a framework for interpreting deep survival models using a risk backpropagation technique. Finally, we illustrate that deep survival models can successfully transfer information across diseases to improve prognostic accuracy. We provide an open-source software implementation of this framework called SurvivalNet that enables automatic training, evaluation and interpretation of deep survival models.

@sw1 Does this relate to the topics you wrote about? Is there anything here worth discussing in the review?

aaronsheldon commented 7 years ago

Does this belong as an example at the end of section 3 under "Temporal Patient Trajectories"? Or the middle of section 5 "Trajectory Prediction for Treatment"?

agitter commented 7 years ago

@aaronsheldon I must admit I haven't read it carefully. Do you have an opinion?

agitter commented 7 years ago

I accidentally closed this. I want to keep it open and potentially add it to our next version.

This paper has a nice assessment of fully connected neural networks versus Cox regression versus random forest for cancer survival prediction. They use gene expression features or "other" features (e.g. basic clinical attributes, mutations, copy number variant, and protein array data) from TCGA. The ability to predict survival well is cancer type-dependent. Models perform better in cancer types with worse outcomes.

Their strategy for assigning risk scores to individual features identifies previously-characterized alterations that characterize tumor subtypes (e.g. IDH gliomas).

At a high-level, there isn't strong evidence that the fully connected network outperforms elastic net Cox regression. The details of how they perform on different feature sets and when training with one cancer type versus multiple cancer types jointly are interesting though.

greenelab / deep-review

Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models #362