Open cgreene opened 8 years ago
Deep belief network (DBN) to predict the RNA sequences of RNA binding proteins (RNP). The DBN is multimodal and integrates information from primary, secondary, and tertiary features (DBN is similar in architecture and multimodal strategy as in #14). There is a significant amount of domain expertise required to engineer the features (similar amount of effort as in #49).
CLIP-seq datasets have gold standard RNA-RNP interactions. Predicting the motif where a given RNP binds is an important biological question that could have wide-ranging implications that extend into medical applications. Primary, secondary, and tertiary structures were all pre-processed carefully and/or predicted by already existing algorithms (e.g. RNAshapes for secondary structure)
Multi-modal DBN trained layerwise with 10-fold cross validation with dropout and L2 regularization across 24 different CLIP-based datasets. Details on how they train with back-propagation are not clear to me (found in supplemental data). It is also not clear to me how they generate binding motifs. It appears that training/reporting performance is separate from generating motifs. They generate motifs by restricting the model to output the probability given that the generated motif binds subtracted by the probability that it does not bind. They then use the change in p as the probability of motif binding. They report AUROC for their network with and without tertiary structure and observe that tertiary structure improves performance in almost every dataset. The code is built on deepnet and is made publicly available (python)
There is a related recent tool (related to RBP) using Deep neural networks along with stacked ensembling on sequence feature of proteins and ncRNAs. IPMiner: hidden ncRNA-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2931-8 Tool availability: [http://www.csbio.sjtu.edu.cn/bioinf/IPMiner] Interaction Pattern Miner predicts sequence based ncRNA-protein interacting motifs using stacked ensembles. Stacked encoder wrapped under Random Forest (RF) models. Final stacked ensembling to combine multiple encoders prediction scores. Methods claims to achieve 20% extra performance than existing tools in the field.
@cgreene @gwaygenomics
IPMiner has been an open tab in my browser for a few weeks now so I'm gladly someone finally got to it since I didn't.
Could you please create a new issue with the paper title as the issue title and the doi, abstract, and your comments above as the issue body? That has helped us cross-reference papers in our discussions. I'd also be interested to hear more how IPMiner contrasts with this paper.
Thanks for joining the discussion. There is a lot of primary literature to cover so I'm happy to see you and others from the Garmire lab contributing.
thanks for posting @kumardeep27 - I am cross referencing your issue (#96) to allow for quick comparisons.
The new paper #132 looks related and can also be cross-referenced here.
https://dx.doi.org/10.1093/nar/gkv1025