Open Zhixuan-Jing opened 2 weeks ago
Dear @Zhixuan-Jing yesterday we released a new version 2.0 of GSVA (see https://bioconductor.org/install for installation instructions), which has a specific "sparse" regime that allows GSVA to efficiently deal with single-cell data stored in either dgCMatrix
objects, or in SingleCellExperiment
objects that use dgCMatrix
objects to store their assay data. With respect to the specific code that you are showing, I'd say that, according to the Seurat wiki, you need to grab the data
slot that should contain the log-normalized counts. If that slot is a dgCMatrix
, then you should be able to build the parameter object with that dgCMatrix
object, and you only need to specify further your gene sets, and minimum and maximum sizes that you want to analyze from those gene sets. The BPPARAM
parameter should allow you to parallelize calculations. Please consult the help page of gsvaParam()
and do not hesitate to contact back in case of problems or questions.
I implemented gsva on a small bulk RNA dataset and it worked well. However, when I implemented it on a large single-cell dataset along with msigdb genesets, the error occurred. My code and data type is shown below:
# cancer is a Seurat object with about 690,000 cells
exp <- cancer@assays[['RNA']]$counts
exp <- as.matrix(exp)
hEMT <- c('ACKR3', 'ADM', ...)
gs_list <- list(c(hEMT))
names(gs_list) <- c("hEMT")
gsva_par <- gsvaParam(exp, gs_list, kcdf = 'Gaussian', minSize = 15, maxSize = 500, maxDiff = TRUE)
gsva_es <- GSVA::gsva(gsva_par)
and the results are as follow: