sqjin / scAI

An unsupervised approach for the integrative analysis of single-cell multi-omics data
GNU General Public License v3.0
26 stars 8 forks source link
integrative-analysis simultaneous-measurements single-cell-multimodal-omics single-cell-multiomics sparse-epigenomic-profile

scAI: a single cell Aggregation and Integration method for analyzing single cell multi-omics data

Once the single cell multi-omics data are decomposed into multiple biologically relevant factors, the package provides functionality for further data exploration, analysis, and visualization. Users can:

Overview of scAI

Check out our paper (Suoqin Jin#, Lihua Zhang# & Qing Nie*, Genome Biology, 2020) for the detailed methods and applications.

Packages

scAI has been implemented as both R package and MATLAB package under the license GPL-3. In each package, we provide example workflows that outline the key steps and unique features of scAI. The MATLAB package and examples are available here.

Installation of R package

Install from Github using devtools

devtools::install_github("sqjin/scAI")

Install from R source codes

Download source codes here and type (in R)

install.packages(path_to_file, type = 'source', rep = NULL) # The path_to_file would represent the full path and file name

This website shows other ways for building and installing an R package.

Examples and Walkthroughs

All the R markdown used to generate the walkthroughs can be found under the /examples directory.

Suggestions for speeding up on large-scale datasets

Using the Python implementation of scAI model

object <- run_scAI(object, K, do.fast = TRUE)

Feature selection

Feature selection can reduce the running time in both scAI model and downstream analysis such as dimension reduction.

The most informative genes can be selected based on their average expression and Fano factor (see our paper for details).

object <- selectFeatures(object, assay = "RNA")
object <- run_scAI(object, K, do.fast = TRUE, hvg.use1 = TRUE)

Unlike scRNA-seq data, the largely binary nature of scATAC-seq data makes it challenging to perform ‘variable’ feature selection. One option is to select the nearby chromsome regions of the informative genes.

object <- selectFeatures(object, assay = "RNA")
loci.use <- searchGeneRegions(genes = object@var.features[[1]], species = "mouse")
object@var.features[[2]] <- loci.use
object <- run_scAI(object, K, do.fast = TRUE, hvg.use1 = TRUE, hvg.use2 = TRUE)

Another option is to use only the top n% of features or remove features present in less that n cells. This method is used in Signac.

Additional installation steps (possibly)

scAI provides functionality for further data exploration, analysis, and visualization. A couple of excellent packages need to be installed.

library(devtools)
install_github('linxihui/NNLM')
install_github("yanwu2014/swne")
install_github("jokergoo/ComplexHeatmap")

Using UMAP and FIt-SNE is recommended for computational efficiency when using reducedDims on very large datasets.

-- install UMAP Python package: pip install umap-learn. Please check here if there is any trouble.

-- install FIt-SNE R package: Installing and compiling the necessary software requires the use of FIt-SNE and FFTW. For detailed instructions of installation, please visit this page.

Troubleshooting on the R Compiler Tools for Rcpp on macOS

If you get the error "clang: error: unsupported option '-fopenmp'" when installing R package, please consider the configuration in ~/.R/Makevars and see this post for detailed configuration. In addition, you may can also reinstall your R because -fopenmp option is usually added by R automatically if openmp is available.

If you are using macOS Mojave Version (10.14) and you might get the error "/usr/local/clang6/bin/../include/c++/v1/math.h:301:15: fatal error: 'math.h' file not found", please check the post. This error can be solved if running the following on the terminal:

sudo installer -pkg \
/Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg \
-target /

Help

If you have any problems, comments or suggestions, please contact us at Suoqin Jin (suoqin.jin@uci.edu) or Lihua Zhang (lihuaz1@uci.edu).

How should I cite scAI?

Jin, S., Zhang, L. & Nie, Q. scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic profiles. Genome Biol 21, 25 (2020). https://doi.org/10.1186/s13059-020-1932-8