nanawei11 / Secuer

A clustering method for scRNA-seq data
MIT License
5 stars 0 forks source link

Secuer: ultrafast, scalable and accurate clustering of single-cell RNA-seq data

Secuer is a superfast and scalable clustering algorithm for (ultra-)large scRNA-seq data analysis based on spectral clustering. Secuer-consensus is a consensus clustering algorithm with Secuer as a subroutine. In addition, Secuer can also be applied to other large-scale omics data with two-dimensional (features by observations). For more details see secuer.

The workflow of Secuer:

Installation

Secuer is available in python.

# use anaconda
conda create -n secuer python=3.9
conda activate secuer 
pip install secuer matplotlib pandas scanpy igraph louvain pyyaml

# or 
pip install secuer

Run Seucer (usage)

Essential parameters

To run Secuer with default parameters, you only need to specify:

options

You can also specify the following options:

Example for run Secuer with custom parameters:

$ Secuer S -i ./example_data/Biase_k3_FPKM_scRNA.csv --yaml ./config.yaml -o ./Biase_result -p 1000 --knn 5 --transpose

Output files

  1. output/SecuerResult.txt is the clustering result.
  2. output/SecuerResult.h5ad is the preprocessed data with the clustering result.

Run Seucer-consensus (usage)

Essential parameters

To run Secuer-consensus with default parameters, you only need to specify:

options

You can also specify the following options:

Example for run Secuer-consensus:

$ Secuer C -i ./example_data/Biase_k3_FPKM_scRNA.csv --yaml ./config.yaml -o ./Biase_conresult  -p 900 --knn 5 -M 7 --transpose

Output files

  1. output/SecuerConsensusResult.txt is the clustering result.
  2. output/SecuerConsensusResult.h5ad is the preprocessed data with the clustering result.

Or run Secuer in Python

import scanpy as sc
import secuer as sr
data = sc.read('example_data/Biase_k3_FPKM_scRNA.csv').T
# data preprocessing
sc.pp.filter_genes(data, min_counts=1)
sc.pp.filter_cells(data, min_counts=1)
sc.pp.normalize_total(data, target_sum=1e4)
sc.pp.log1p(data)
sc.pp.highly_variable_genes(data, min_mean=0.0125, max_mean=3, min_disp=0.5)
data = data[:, data.var.highly_variable]
sc.pp.scale(data, max_value=10)
sc.tl.pca(data)

# run secuer
fea = data.obsm['X_pca']
res = sr.secuer(fea= fea,
                Knn=5,
                multiProcessState=True,
                num_multiProcesses=4)

# run secuer-consensus
resC = sr.secuerconsensus(run_secuer=True,
                          fea= fea,
                          Knn=5,
                          M=5,
                          multiProcessState=True,
                          num_multiProcesses=4)

Citation

Wei N, Nie Y, Liu L, Zheng X, Wu H-J (2022) Secuer: Ultrafast, scalable and accurate clustering of single-cell RNA-seq data. PLOS Computational Biology 18(12): e1010753. https://doi.org/10.1371/journal.pcbi.1010753.