BREMSC is an R package (with core functions jointDIMMSC and BREMSC) for joint clustering droplet-based scRNA-seq and CITE-seq data. jointDIMMSC is developed as a direct extension of DIMMSC, which assumes full indenpendency between single cell RNA and surface protein data. To take the correlation between two data sources into consideration, we further develop BREMSC, which uses random effects to incorporate the two data sources. This package can directly work on raw count data from droplet-based scRNA-seq and CITE-seq experiments without any data transformation, and it can provide clustering uncertainty for each cell.
Version: 0.2.0 (Date: 2020-03-02)
Install BREMSC from Github
install.packages("devtools")
library(devtools)
install_github("tarot0410/BREMSC")
Or terminal command (first download BREMSC source file from Wei Chen's Lab website)
R CMD INSTALL BREMSC_0.1.0.tar
jointDIMMSC is developed as an extension of DIMMSC, which assumes full indenpendency between single cell RNA and surface protein data. We construct the joint likelihood of the two data sources as their product, and use EM algorithm for parameter inference. In practice, the computational speed for jointDIMMSC is much faster than BREMSC, but the model assumption is more stringent.
jointDIMMSC(dataProtein, dataRNA, K, useGene = 100, maxiter = 100, tol = 1e-04, lik.tol = 0.01)
jointDIMMSC returns a list object containing:
# First load BREMSC R package
library(BREMSC)
# Next load the example simulated data (dataADT: protein data; dataRNA: RNA data)
data("dataADT")
data("dataRNA")
# Test run of jointDIMMSC
testRun <- jointDIMMSC(dataADT, dataRNA, K=4)
Similar to jointDIMMSC, BREMSC uses separate Dirichlet mixture priors to characterize variations across cell types for each data source, but it further uses random effects to incorporate the two data sources. A Bayesian framework with Gibbs-sampling is used for parameter estimation. The computational speed for BREMSC is much slower than jointDIMMSC. In practice, nMCMC>500 is suggested in real application, and running with more than 3 chains (set as a parameter) are strongly recommended for better stability. Also, a prescreening of RNA features is necessary to help reduce time and noise for BREM-SC. It is recommended to use less than 1000 RNA features.
BREMSC(dataProtein, dataRNA, K, nChains = 3, nMCMC = 1000, sd_alpha = c(0.5, 1.5), sd_b = c(0.2, 1), sigmaB = 0.8)
BREMSC returns a list object containing:
# First load BREMSC R package
library(BREMSC)
# Next load the example simulated data (dataADT: protein data; dataRNA: RNA data)
data("dataADT")
data("dataRNA")
# Test run of BREMSC (using small number of MCMC here to save time)
testRun <- BREMSC(dataADT, dataRNA, K=4, nChains=2, nMCMC=100)
# Check convergence of log likelihoos
plot(testRun$vecLogLik, type = "l", xlab = "MCMC Iterations", ylab = "Log likelihood") # consider to increase the number of MCMCs if the log likelihood doesn't look like converged
Xinjun Wang (xiw119@pitt.edu), Wei Chen.