tarot0410 / BREMSC

Novel joint clustering method with scRNA-seq and CITE-seq data
8 stars 3 forks source link
cite-seq clustering-methods scrna-seq-analysis

BREMSC

BREMSC is an R package (with core functions jointDIMMSC and BREMSC) for joint clustering droplet-based scRNA-seq and CITE-seq data. jointDIMMSC is developed as a direct extension of DIMMSC, which assumes full indenpendency between single cell RNA and surface protein data. To take the correlation between two data sources into consideration, we further develop BREMSC, which uses random effects to incorporate the two data sources. This package can directly work on raw count data from droplet-based scRNA-seq and CITE-seq experiments without any data transformation, and it can provide clustering uncertainty for each cell.

Version: 0.2.0 (Date: 2020-03-02)

See Homepage @ Wei Chen's Lab

Installation

Install BREMSC from Github

install.packages("devtools")
library(devtools)
install_github("tarot0410/BREMSC")

Or terminal command (first download BREMSC source file from Wei Chen's Lab website)

R CMD INSTALL BREMSC_0.1.0.tar

Function jointDIMMSC

Introduction

jointDIMMSC is developed as an extension of DIMMSC, which assumes full indenpendency between single cell RNA and surface protein data. We construct the joint likelihood of the two data sources as their product, and use EM algorithm for parameter inference. In practice, the computational speed for jointDIMMSC is much faster than BREMSC, but the model assumption is more stringent.

Usage

jointDIMMSC(dataProtein, dataRNA, K, useGene = 100, maxiter = 100, tol = 1e-04, lik.tol = 0.01)

Arguments

Values

jointDIMMSC returns a list object containing:

Example:

# First load BREMSC R package
library(BREMSC)

# Next load the example simulated data (dataADT: protein data; dataRNA: RNA data)
data("dataADT")
data("dataRNA")

# Test run of jointDIMMSC
testRun <- jointDIMMSC(dataADT, dataRNA, K=4)

Function BREMSC

Introduction

Similar to jointDIMMSC, BREMSC uses separate Dirichlet mixture priors to characterize variations across cell types for each data source, but it further uses random effects to incorporate the two data sources. A Bayesian framework with Gibbs-sampling is used for parameter estimation. The computational speed for BREMSC is much slower than jointDIMMSC. In practice, nMCMC>500 is suggested in real application, and running with more than 3 chains (set as a parameter) are strongly recommended for better stability. Also, a prescreening of RNA features is necessary to help reduce time and noise for BREM-SC. It is recommended to use less than 1000 RNA features.

Usage

BREMSC(dataProtein, dataRNA, K, nChains = 3, nMCMC = 1000, sd_alpha = c(0.5, 1.5), sd_b = c(0.2, 1), sigmaB = 0.8)

Arguments

Values

BREMSC returns a list object containing:

Example:

# First load BREMSC R package
library(BREMSC)

# Next load the example simulated data (dataADT: protein data; dataRNA: RNA data)
data("dataADT")
data("dataRNA")

# Test run of BREMSC (using small number of MCMC here to save time)
testRun <- BREMSC(dataADT, dataRNA, K=4, nChains=2, nMCMC=100)

# Check convergence of log likelihoos
plot(testRun$vecLogLik, type = "l", xlab = "MCMC Iterations", ylab = "Log likelihood") # consider to increase the number of MCMCs if the log likelihood doesn't look like converged

Publications

Contact

Xinjun Wang (xiw119@pitt.edu), Wei Chen.