Closed AndrewSkelton closed 1 year ago
Thanks for this - we had a few questions for debugging.
Does this work with LogNormalization rather than SCT?
If you run this in the terminal, rather than RStudio, can you see what error message is thrown?
Are you able to merge the 65 objects together (not even worrying about integration)?
Thanks for the direction. I started with question 3 (made most sense to me, and easy to implement). Running a merge in the terminal and on RStudio results in the same issue. RStudio produces the same RSession abort, and the terminal session produces the below.
> foo <- merge(x = data_list[[1]], y = data_list[-1])
zsh: killed R
So that obviously narrows the problem down but pretty sure I can't use the traceback
function if it kills the session.
I tried increasing the memory to ~1TB (by setting ~/.Renviron
to R_MAX_VSIZE=1000Gb
). Error persists.
So I'm a bit stumped.
Small update - I've narrowed it to 20 samples (404,327 cells) that can be successfully merged before this issue kicks in.
Thanks so much - just to clarify again which helps us debug, does your example work successfully (using the full dataset) using the LogNormalize workflow?
The logNormalise workflow successfully generates an AnchorSet for all 65 samples using rpca
with a single reference sample. Unfortunately it crashes on the IntegrateData
step. Same kind of error, RSession abort.
did this ever get figured out? I am getting a similar issue with the Rsession aborting during SCTransform & it only happens on the last step & for my dataset w/ 200k cells (but not with the one with 150k cells)
Hi, thanks for the wonderful package. I am having the same issue with ~100k cells
Hi, thanks for the wonderful package. I am having the same issue with ~100k cells
i ended up having to use 500GB ram and it started working, even though I should have had enough before (120gb of ram for a 40gb dataset)
I have a similar issue but with a different error message. LogNormalization works fine but it gives an error at "IntegrateData" if I use SCT.
Here's code:
...
ref.list <- lapply(X = ref.list, FUN = function(x) {
x <- suppressWarnings(SCTransform(x, verbose = FALSE))
})
sct.features <- SelectIntegrationFeatures(object.list = ref.list, nfeatures = 3000, verbose = FALSE)
ref.list <- PrepSCTIntegration(object.list = ref.list, anchor.features = sct.features)
ref.list <- lapply(X = ref.list, FUN = function(x) {
x <- RunPCA(x, features = sct.features, verbose = FALSE)
})
sct.anchors <- FindIntegrationAnchors(object.list = ref.list, normalization.method = "SCT", reduction = "rpca", anchor.features = sct.features, verbose = FALSE)
integrated <- IntegrateData(anchorset = sct.anchors, normalization.method = 'SCT', dims = 1:50, verbose = FALSE)
Here's error msg:
Error in RowMergeMatricesList(mat_list = all.mat, mat_rownames = all.rownames, : Need S4 class dgRMatrix for a sparse matrix
Traceback:
1. IntegrateData(anchorset = sct.anchors, normalization.method = "SCT",
. dims = 1:50, verbose = FALSE)
2. PairwiseIntegrateReference(anchorset = anchorset, new.assay.name = new.assay.name,
. normalization.method = normalization.method, features = features,
. features.to.integrate = features.to.integrate, dims = dims,
. k.weight = k.weight, weight.reduction = weight.reduction,
. sd.weight = sd.weight, sample.tree = sample.tree, preserve.order = preserve.order,
. eps = eps, verbose = verbose)
3. merge(x = object.1, y = object.2, merge.data = TRUE)
4. merge.Seurat(x = object.1, y = object.2, merge.data = TRUE)
5. merge(x = assays.merge[[1]], y = assays.merge[2:length(x = assays.merge)],
. merge.data = merge.data)
6. merge.Assay(x = assays.merge[[1]], y = assays.merge[2:length(x = assays.merge)],
. merge.data = merge.data)
7. RowMergeSparseMatrices(mat1 = counts.mats[[1]], mat2 = counts.mats[2:length(x = counts.mats)])
8. RowMergeMatricesList(mat_list = all.mat, mat_rownames = all.rownames,
. all_rownames = all.names)
Hi, I'm having a similar issue while integrating 6 10x datasets (size of list with the 6 objects: 22.9gb)
I can run the NormalizeData + FindVariableFeatures but I get "R session abort - fatal error" while running the FindIntegrationAnchors. Also planned to try FindIntegrationAnchors with reduction = "rpca" but I run in a fatal error again during ScaleData.
I'm running on a machine with 12 cores and 128gb RAM, latest versions of R and Seurat.
I am actually using a published dataset (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE161621) and their code (https://github.com/neurojacob/blum_et_al_2021/blob/master/paper_analysis_jupyter.ipynb) so it's supposed to work... But I also tried to follow strictly the Seurat vignettes with the same results.
Any help would be appreciated. I don't know if I should try a more powerful machine for instance.
Had a similar problem when working with 200k+ cells. It went through the FindIntegrationAnchors step using CCA, although it took a few hours. And then Rstuido aborts during the integration step.
I've got also the same problem trying to integrate data of 200K nuclei from patients with different driver mutations. Everything works well up until the "FindIntegrationAnchors" function, I've tried RStudio and R (running under Ubuntu16), both do not work and keep returning "cannot allocate buffer" or "this vector is too large". I'm really stumped and would look forward to getting an update from anyone who has figured this out. Thanks!
I ran into a similar problem with 10K cells, has anyone figured out a solution to this?
Hi, I'm having the same problem, R session aborted during the integration step (12 datasets, 75K nuclei).
Problem too large error when merging 2M cells
I'm having the same problem here with 230k cells. I'm testing SCTransform+harmony
Same problem here. 340K cells Has it ever been solved? I am stuck.
Same problem, 84 k cells with around 169 G RAM on a node of HPC cluster.
Same problem, 68K cells with around 150G RAM
Same problem, failing at IntegrateData step for 17 datasets (~460K cells), although the function runs for 6 processes then fails. Running on HPC cluster, 120Gb RAM. Using RPCA method with SCTransform.
Hi all, please refer to the new beta release of Seurat v5 for support for analyzing and integrating large datasets!
Hi,
First and foremost, thanks for the hard work in making such a lovely framework to analyse with!
My problem is around 65 10x samples that I'm trying to integrate, which comes to around 1M Cells. I'll outline my code, session info, and machine stats below. This code runs fine with 1/3 of the data (~18 samples, ~330,000 Cells) in about an hour post SCT.
Methodology Each Sample ran through SCT, then Integration via
SelectIntegrationFeatures
,PrepSCTIntegration
,RunPCA
,FindIntegrationAnchors
(rpca
mode with reference),IntegrateData
.Expected Behaviour - Integration Anchors generated Actual Behaviour - RSession Abort / Crash. No error message produced
Machine 0.5TB Memory, 16 Cores
Code
1. Run SCT on Each Sample
2. Import and Integrate
The code above fails at the
FindIntegrationAnchors
step - Doesn't progress to any sample by sample operations via messages in the console, and then after about an hour it crashes the RSession. The above code works perfectly on less samples, and I'm doubtful this is a memory error. Any way to narrow down what the problem might be?Thanks for any help!
Andrew
Session Info