ttgump / scDeepCluster

scDeepCluster for Single Cell RNA-seq data
Apache License 2.0
98 stars 36 forks source link
deep-learning single-cell-rna-seq unsupervised-learning

scDeepCluster

scDeepCluster, a model-based deep embedding clustering for Single Cell RNA-seq data. See details in our paper: "Clustering single-cell RNA-seq data with a model-based deep learning approach" published in Nature Machine Intelligence https://www.nature.com/articles/s42256-019-0037-0.

Table of contents

Network diagram

alt text

Requirements

Python --- 3.6.3

Keras --- 2.1.4

Tensorflow --- 1.1.0

Scanpy --- 1.0.4

Nvidia Tesla K80 (12G)

Please note that if using different versions, the results reported in our paper might not be able to repeat.

Usage

python scDeepCluster.py --data_file data.h5 --n_clusters 10

set data_file to the destination to the data (stored in h5 format, with two components X and Y, where X is the cell by gene count matrix and Y is the true labels), n_clusters to the number of clusters.

The final output reports the clustering performance, here is an example on 10X PBMC scRNA-seq data:

Final: ACC= 0.8100, NMI= 0.7736, ARI= 0.7841

Pytorch version

Recommend the pytorch version, I have added some new features: 1. automatically estimating number of clusters after pretraining; 2. clustering on datasets from different batches.

See detail at https://github.com/ttgump/scDeepCluster_pytorch

Raw data

The raw data used in this paper can be found: https://figshare.com/articles/dataset/scDeepCluster_supporting_data/17158025

Online app

Online app website: https://app.superbio.ai/apps/107

Contact

Tian Tian tiantianwhu@163.com