greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.25k stars 271 forks source link

A deep learning framework for modeling structural features of RNA-binding protein targets #15

Open cgreene opened 8 years ago

cgreene commented 8 years ago

https://dx.doi.org/10.1093/nar/gkv1025

gwaybio commented 8 years ago

Deep belief network (DBN) to predict the RNA sequences of RNA binding proteins (RNP). The DBN is multimodal and integrates information from primary, secondary, and tertiary features (DBN is similar in architecture and multimodal strategy as in #14). There is a significant amount of domain expertise required to engineer the features (similar amount of effort as in #49).

Biology

CLIP-seq datasets have gold standard RNA-RNP interactions. Predicting the motif where a given RNP binds is an important biological question that could have wide-ranging implications that extend into medical applications. Primary, secondary, and tertiary structures were all pre-processed carefully and/or predicted by already existing algorithms (e.g. RNAshapes for secondary structure)

Computation

Multi-modal DBN trained layerwise with 10-fold cross validation with dropout and L2 regularization across 24 different CLIP-based datasets. Details on how they train with back-propagation are not clear to me (found in supplemental data). It is also not clear to me how they generate binding motifs. It appears that training/reporting performance is separate from generating motifs. They generate motifs by restricting the model to output the probability given that the generated motif binds subtracted by the probability that it does not bind. They then use the change in p as the probability of motif binding. They report AUROC for their network with and without tertiary structure and observe that tertiary structure improves performance in almost every dataset. The code is built on deepnet and is made publicly available (python)

kumardeep27 commented 8 years ago

There is a related recent tool (related to RBP) using Deep neural networks along with stacked ensembling on sequence feature of proteins and ncRNAs. IPMiner: hidden ncRNA-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2931-8 Tool availability: [http://www.csbio.sjtu.edu.cn/bioinf/IPMiner] Interaction Pattern Miner predicts sequence based ncRNA-protein interacting motifs using stacked ensembles. Stacked encoder wrapped under Random Forest (RF) models. Final stacked ensembling to combine multiple encoders prediction scores. Methods claims to achieve 20% extra performance than existing tools in the field.

@cgreene @gwaygenomics

agitter commented 8 years ago

IPMiner has been an open tab in my browser for a few weeks now so I'm gladly someone finally got to it since I didn't.

Could you please create a new issue with the paper title as the issue title and the doi, abstract, and your comments above as the issue body? That has helped us cross-reference papers in our discussions. I'd also be interested to hear more how IPMiner contrasts with this paper.

Thanks for joining the discussion. There is a lot of primary literature to cover so I'm happy to see you and others from the Garmire lab contributing.

gwaybio commented 8 years ago

thanks for posting @kumardeep27 - I am cross referencing your issue (#96) to allow for quick comparisons.

agitter commented 8 years ago

The new paper #132 looks related and can also be cross-referenced here.