greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.25k stars 270 forks source link

A Deep Boosting Based Approach for Capturing the Sequence Binding Preferences of RNA-Binding Proteins from High-Throughput CLIP-Seq Data #138

Open gwaybio opened 7 years ago

gwaybio commented 7 years ago

Li, Dong, Wu et al. 2016

bioRxiv

http://doi.org/10.1101/086421

Abstract

Characterizing the binding behaviors of RNA-binding proteins (RBPs) is important for understanding their functional roles in gene expression regulation. However, current high-throughput experimental methods for identifying RBP targets, such as CLIP-seq and RNAcompete, usually suffer from the false positive and false negative issues. Here, we develop a deep boosting based machine learning approach, called DeBooster, to accurately model the binding sequence preferences and identify the corresponding binding targets of RBPs from CLIP-seq data. Comprehensive validation tests have shown that DeBooster can outperform other state-of-the-art approaches in predicting RBP targets and recover false negatives that are common in current CLIP-seq data. In addition, we have demonstrated several new potential applications of DeBooster in understanding the regulatory functions of RBPs, including the binding effects of the RNA helicase MOV10 on mRNA degradation, the influence of different binding behaviors of the ADAR proteins on RNA editing, as well as the antagonizing effect of RBP binding on miRNA repression. Moreover, DeBooster may provide an effective index to investigate the effect of pathogenic mutations in RBP binding sites, especially those related to splicing events. We expect that DeBooster will be widely applied to analyze large-scale CLIP-seq experimental data and can provide a practically useful tool for novel biological discoveries in understanding the regulatory mechanisms of RBPs. The scource code of DeBooster can be downloaded from http://github.com/dongfanghong/deepboost.

GitHub

Nice to see code is provided: http://github.com/dongfanghong/deepboost by @dongfanghong. Happy to get your input here as well!

Summary

Biological

Predict sequence specificity of RNA binding proteins from CLIP-seq data. Takes into account observed RNA binding from the data including local sequence context to build models.

Computational

Deep boosting model (ensemble of weighted decision trees) outputs expected binding motifs of RBPs. The problem the authors are attempting to overcome is high false positive/false negative issues with CLIP-seq data.

This appears to be the second generation classifier of #15 (work coming out of the same lab) and appears to have better performance.

agitter commented 7 years ago

@gwaygenomics I only spent a couple minutes with this paper, but my shallow understanding is that this is not a deep learning method. They use deep boosting instead. In addition, #15 used structure as input whereas this newer method is sequence-only.

Were you thinking of including this as an example of where a deep learning method has been surpassed by other approaches? Figure 2a shows deep boosting almost always outperforms the deep belief net (#15), if I'm reading it correctly.