greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.25k stars 270 forks source link

gkm-DNN: efficient prediction using gapped k-mer features and deep neural networks #634

Open michaelmhoffman opened 7 years ago

michaelmhoffman commented 7 years ago

https://doi.org/10.1101/170761

How to extract informative features from genome sequence is a challenging issue. Gapped k-mers frequency vectors (gkm-fv) has been presented as a new type of features in the last few years. Coupled with support vector machine (gkm-SVM), gkm-fvs have been used to achieve an effective sequence-based prediction (e.g., transcription factor binding site prediction). However, the huge computation of a large kernel matrix prevents it from using large amount of data. To this end, we proposed a flexible and scalable framework gkm-DNN to achieve feature representation and prediction from high-dimensional gkm-fvs using deep neural networks (DNN). We first implemented an efficient method to calculate the gkm-fv of a given sequence. We then adopted a DNN model with gkm-fvs as input to achieve a prediction task. Here, we took the transcription factor binding site prediction as an illustrative application. We applied gkm-DNN onto 467 small and 69 big human ENCODE ChIP-seq datasets to demonstrate its performance and compared it with the state-of-the-art method gkm-SVM. We demonstrated that gkm-DNN can not only overcome the drawbacks of high dimensionality, colinearity and sparsity of gkm-fvs, but also make comparable overall performance and distinct better accuracy compared with gkm-SVM in much shorter time. Moreover, gkm-DNN can be easily adapted to other applications and combine different types of data using computational graphs.

agitter commented 7 years ago

@michaelmhoffman do you plan to add this to your new gkm-SVM discussion in #623 or should we save it for a subsequent pull request?

akundaje commented 7 years ago

Just FYI, I had written a few comments about this paper on the Biorxiv comments section

agitter commented 7 years ago

Linking @akundaje's comments

michaelmhoffman commented 7 years ago

I think we should save it. Also, thanks @akundaje.