Deep Learning for Metagenomics and Nanopore Sequencing?

gailrosen commented 8 years ago

Hi Casey et al.!

Steve told me that you're doing a review.

We have published on using multilayer recurrent neural networks for metagenomic samples: http://ieeexplore.ieee.org/document/7219432/?arnumber=7219432&tag=1

Also, during my sabbatical, I was looking at how to apply LSTM's for nanopore sequencing.

Could I contribute something to this? (The problem is... Nov. 1 is such a short deadline on this because I just found out!).

best wishes, Gail

agitter commented 8 years ago

Multi-Layer and Recursive Neural Networks for Metagenomic Classification http://doi.org/10.1109/TNB.2015.2461219

Recent advances in machine learning, specifically in deep learning with neural networks, has made a profound impact on fields such as natural language processing, image classification, and language modeling; however, feasibility and potential benefits of the approaches to metagenomic data analysis has been largely under-explored. Deep learning exploits many layers of learning nonlinear feature representations, typically in an unsupervised fashion, and recent results have shown outstanding generalization performance on previously unseen data. Furthermore, some deep learning methods can also represent the structure in a data set. Consequently, deep learning and neural networks may prove to be an appropriate approach for metagenomic data. To determine whether such approaches are indeed appropriate for metagenomics, we experiment with two deep learning methods: i) a deep belief network, and ii) a recursive neural network, the latter of which provides a tree representing the structure of the data. We compare these approaches to the standard multi-layer perceptron, which has been well-established in the machine learning community as a powerful prediction algorithm, though its presence is largely missing in metagenomics literature. We find that traditional neural networks can be quite powerful classifiers on metagenomic data compared to baseline methods, such as random forests. On the other hand, while the deep learning approaches did not result in improvements to the classification accuracy, they do provide the ability to learn hierarchical representations of a data set that standard classification methods do not allow. Our goal in this effort is not to determine the best algorithm in terms accuracy-as that depends on the specific application-but rather to highlight the benefits and drawbacks of each of the approach we discuss and provide insight on how they can be improved for predictive metagenomic analysis.

agitter commented 8 years ago

@gailrosen We changed our timeline today. We now aim to have the paper ready by December 1, so there is plenty of time to contribute. I'm going to (attempt to) organize the participants soon. The discussion in #88 will help fill you in on the goals of the review.

We're still deciding which papers to include. I didn't read yours yet, but at a glance it looks like the human metagenome application would potentially fit in the discussion of studying disease. Your conclusions about random forest relative to the other methods could also be relevant. We aim to present a balanced overview of neural networks, so it's good to point out that random forests achieve better accuracy in certain settings.

greenelab / deep-review

Deep Learning for Metagenomics and Nanopore Sequencing? #114