greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.24k stars 272 forks source link

Exploring Single-Cell Data with Deep Multitasking Neural Networks #733

Open mattamodio opened 6 years ago

mattamodio commented 6 years ago

Exploring Single-Cell Data with Deep Multitasking Neural Networks

Matthew Amodio Krishnan Srinivasan David van Dijk Hussein Mohsen Kristina Yim Rebecca Muhle Kevin R. Moon Susan Kaech Ryan Sowell Ruth Montgomery James Noonan Guy Wolf Smita Krishnaswamy

Published on bioRxiv.

alxndrkalinin commented 6 years ago

@the1mane1event please take a look at the format we use for the new paper type issues and add the link to the paper and the abstract

agitter commented 6 years ago

@the1mane1event #725 is a recent example of the format that @alxndrkalinin mentioned. Having the DOI link (and bioRxiv link if the DOI is not yet live) and abstract are the essentials. Any summary commentary you have is an optional bonus. You can edit your original post to keep the links at the top of this thread.

(and good to hear from you again!)

kmoon3 commented 6 years ago

@agitter @alxndrkalinin, the bioRxiv link is available now. I think @the1mane1event may be traveling so I'll post what you need here. He can update his post later if necessary. Here's the link and abstract.

https://doi.org/10.1101/237065

Handling the vast amounts of single-cell RNA-sequencing and CyTOF data, which are now being generated in patient cohorts, presents a computational challenge due to the noise, complexity, sparsity and batch effects present. Here, we propose a unified deep neural network-based approach to automatically process and extract structure from these massive datasets. Our unsupervised architecture, called SAUCIE (Sparse Autoencoder for Unsupervised Clustering, Imputation, and Embedding), simultaneously performs several key tasks for single-cell data analysis including 1) clustering, 2) batch correction, 3) visualization, and 4) denoising/imputation. SAUCIE is trained to recreate its own input after reducing its dimensionality in a 2-D embedding layer which can be used to visualize the data. Additionally, it uses two novel regularizations: (1) an information dimension regularization to penalize entropy as computed on normalized activation values of the layer, and thereby encourage binary-like encodings that are amenable to clustering and (2) a Maximal Mean Discrepancy penalty to correct batch effects. Thus SAUCIE has a single architecture that denoises, batch-corrects, visualizes and clusters data using a unified representation. We show results on artificial data where ground truth is known, as well as mass cytometry data from dengue patients, and single-cell RNA-sequencing data from embryonic mouse brain.