IntegrateData() fails: Error in validityMethod(as(object, superClass))

vertesy commented 3 years ago

Parameter	Value
Date	08/12/2020
Time	~23:00
Queue	m
Node	?
Memory requested	?1500
CPUs requested	?65
CPUs used	30

> workers
[1] 30
> tic(); combined.obj <- IntegrateData(anchorset = anchors, dims = 1:p$'n.CC'); toc(); say()
Finding integration vector weights
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Integrating data
Merging dataset 17 into 22 33
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|

...

Integrating data
Merging dataset 22 33 17 36 37 35 38 39 40 1 into 6 15 18 19 14 20 7 8 10 11 9
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Integrating data
Merging dataset 24 41 23 31 29 30 28 3 21 32 25 26 27 12 13 16 into 6 15 18 19 14 20 7 8 10 11 9 22 33 17 36 37 35 38 39 40 1
Extracting anchors for merged samples
Finding integration vectors
Error in validityMethod(as(object, superClass)) :
  long vectors not supported yet: ../../src/include/Rinlinedfuns.h:535
In addition: There were 50 or more warnings (use warnings() to see the first 50)

pipe broke overnight after this, so cannot see warnings().

vertesy commented 3 years ago

Relevant issues:

https://github.com/satijalab/seurat/issues/2063

I think the issue here is the use of large numbers of genes for features.to.integrate. This creates a non-sparse matrix for all genes, and is infeasible for any method - its not a specific problem with the Seurat alignment workflow. We do not suggest batch-correcting all genes, only ones that exhibit variation across single-cells, which are informative for downstream clustering analyses.

https://github.com/satijalab/seurat/issues/1029

Thanks for the question - we've explored this and the cause is that there are so many anchors, that it creates a sparse matrix with >2^31 elements in R, which can throw an error.

This happens to me when I give a large number of genes for features.to.integrate.

I don't think this is Seurat's problem, but the problem with Matrix, which still doesn't support vectors with more than 2^31 elements. It's just that a sparse matrix with too many non-zero elements is produced. This can be worked around by using the sparse matrix package spam64, but will require changes to Seurat's source code. Actually supporting long vectors is on the to do list of Matrix developers, but somehow they still haven't implemented it.

vertesy commented 3 years ago

Suggestions

Make 1 dataset reference Add reference = 1 to anchors <- FindIntegrationAnchors(seus, normalization.method = "SCT", anchor.features = features_use, reference = 1) .
Use RPCA -> see error in #8
Decrease the number of genes for features.to.integrate. (debated, error at 1000) -> will try
Not specifying anything for dims or features.to.integrate. IntegrateData( anchorset = scData.Anchors )

vertesy commented 3 years ago

makes an invalid assumption to our analysis → NO
rPCA finally worked, but it was very tough to get it run
"Decrease the number of genes" did not solve it → NO
Defaults did not solve it → NO

aelhossiny commented 2 years ago

Hi, I am facing the same problem, my dataset is around 122k cells from 32 samples. Both methods fail (CCA and rPCA) when it comes to integratedata() step. I tried as low as 1k variable features but it's not working. How did you get rPCA method to work?

Note: both methods work fine when integrating using the previous normalization methods (log2 norm)

tinakeshav commented 2 years ago

Hey @vertesy , how did you get rPCA to run with SCT normalization? I've been struggling with this for weeks now, would infinitely appreciate any input

vertesy / Seurat.CBE.issues

IntegrateData() fails: Error in validityMethod(as(object, superClass)) #4

Suggestions