igrabski / sc-SHC

Significance analysis for clustering single-cell RNA-sequencing data
92 stars 10 forks source link

Significance Analysis for Clustering Single-Cell RNA-Sequencing Data

Contact information: igrabski[at]nygenome[dot]org

We introduce a model-based hypothesis testing approach for evaluating single-cell RNA-sequencing (scRNA-seq) clusters. This approach is implemented in two ways: (1) a stand-alone clustering pipeline with built-in hypothesis testing to produce clusters corresponding to distinct cell populations and (2) a post-hoc method that can evaluate the statistical significance of any provided set of clusters.

Our package can be installed as follows:

# install.packages("devtools")
devtools::install_github("igrabski/sc-SHC")

Usage

To use the stand-alone clustering pipeline, sc-SHC (single-cell significance of hierarchical clustering), the following command can be used:

library(scSHC)
clusters <- scSHC(data)

Here, data should be a (possibly sparse) matrix where the rows are genes and the columns are cells. Optionally, the following parameters can be adjusted:

To evaluate the significance of any provided set of clusters, the following command can be used:

library(scSHC)
new_clusters <- testClusters(data, as.character(clusters))

Here, data is the same as before, and clusters should be a character vector of cluster labels, corresponding to cells in the same order as the columns of the data matrix. The same parameters as above can be adjusted. Additionally, if desired, a given set of genes can be provided through the parameter var.genes rather than allowing our approach to identify informative genes on its own.