satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.27k stars 910 forks source link

IntegrateLayers SparseMatrix error #7344

Closed trebbiano closed 3 months ago

trebbiano commented 1 year ago

Hi, I'm having trouble implementing the new sketch integration. Specifically, I am experiencing memory issues at different steps of the integration. My understanding was that BPCells and sketching methods were designed to allow large datasets to be analyzed on limited RAM systems, but now I am having trouble integrating a sample even on a 512 GB node. The latest error message is below. Am I doing something incorrectly or does even sketch integration require >500GB RAM?

Extracting anchors for merged samples
Finding integration vectors
Error in .T2C(newTMat(i = c(ij1[, 1], ij2[, 1]), j = c(ij1[, 2], ij2[,  :
  unable to coerce from TsparseMatrix to [CR]sparseMatrixwhen length of 'i' slot exceeds 2^31-1
In addition: Warning messages:
1: `invoke()` is deprecated as of rlang 0.4.0.
Please use `exec()` or `inject()` instead.

The command used was as follows:

object <- IntegrateLayers(object, method=RPCAIntegration, orig="pca", new.reduction="integrated.rpca", 
dims=1:30, k.anchor=20, verbose=TRUE)

The dataset contains about 90 source samples adding up to ~350k cells.

System details:

CentOS Linux release 7.9.2009 (Core)
> packageVersion("Seurat")
[1] ‘4.9.9.9044’
R version 4.3.0 (2023-04-21)
              total        used        free      shared  buff/cache   available
Mem:           503G         13G        486G         32M        3.2G        488G
Swap:           15G          0B         15G

Thanks!

yuhanH commented 1 year ago

hi, Could you post the scripts you run for this sketch integration?

trebbiano commented 1 year ago

Of course:

library("Seurat")
load("./data/object.Rd")
object[["RNA"]] <- split(object[["RNA"]], f=object$orig.ident)
object <- FindVariableFeatures(object, verbose=TRUE)
object <- SketchData(object = object, ncells = 5000, method = "LeverageScore", sketched.assay = "sketch")
DefaultAssay(object) <- "sketch"
object <- FindVariableFeatures(object, verbose = TRUE)

library("future")
plan("multisession", workers=4)
options(future.globals.maxSize = 7000 * 1024^2)

object <- ScaleData(object, verbose = TRUE)
object <- RunPCA(object, verbose = F)
object <- IntegrateLayers(object, method=RPCAIntegration, orig="pca",
                         new.reduction="integrated.rpca",
                         dims=1:30,
                         k.anchor=20,
                         verbose=TRUE)
cailing20 commented 1 year ago

I did not use sketch integration, but have a similar problem. The error occurred in the middle of the IntegrateData step.

Integrating data
Merging dataset 3 13 1 into 2 17 18 5
Extracting anchors for merged samples
Finding integration vectors
Error in .T2C(newTMat(i = c(ij1[, 1], ij2[, 1]), j = c(ij1[, 2], ij2[,  : 
  unable to coerce from TsparseMatrix to [CR]sparseMatrixwhen length of 'i' slot exceeds 2^31-1
Calls: IntegrateData ... FindIntegrationMatrix -> - -> - -> .Arith.Csparse -> .T2C
Execution halted

Command used:

sc_list[] <- lapply(sc_list, FUN = RunPCA, features = features,npcs=50)
sc_anchors<-FindIntegrationAnchors(object.list = sc_list,normalization.method = "SCT",anchor.features = features, reduction = "rpca",dims = 1:50,k.anchor = 5)
sc_combined<-IntegrateData(anchorset = sc_anchors,normalization.method = "SCT",dims = 1:50)

I have 20 samples with a total of 146K cells. Ran on a 512GB node with:

SeuratObject_4.1.3 Seurat_4.3.0
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Red Hat Enterprise Linux Server 7.7 (Maipo)

Any suggestions would be apprecaited!

jeraldnoble commented 1 year ago

I have the same sparse matrix error when using seurat v4 CCA integration but not in seurat v5. Perhaps the issue is the size limit for sparse matrices in the Matrix library?

shivUSF commented 7 months ago

Did you find a solution for this ? if so can you please share ?

mhkowalski commented 3 months ago

Hi,

I'm closing this as this seems to be a v4 specific problem, and I'd highly recommend upgrading to v5 to run sketch integration for large datasets (and this should not take 512G of RAM). Please open a new issue if you continue to have this problem with sketch integration in Seurat v5.