satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.29k stars 915 forks source link

“problem too large” when using CreateGeneActivityMatrix #2359

Closed CarnoZhao closed 4 years ago

CarnoZhao commented 4 years ago

Hi, I'm using CreateGeneActivityMatrix with a large ATAC peak matrix (~400000 x 15000). The as function that convert the sparse matrix to normal matrix will cause the problem too large error.

https://github.com/satijalab/seurat/blob/fc4a4f5203227832477a576bfe01bc6efeb23f51/R/preprocessing.R#L231-L251

Maybe we can just keep peak.matrix as dgcMatrix and do the following lines:

... # code before sapply, keep peak.matrix as dgcMatrix
newmat.list = lapply(..., function(x) {
    feature.use = ...
    submat = peak.matrix[feature.use, ]
    if (length(features.use) > 1) {submat = colSums(submat)}
    # submat is a one line vector, then convert it to sparse matrix
    return(as(as.matrix(submat), Class = "dgCMatrix"))
})
# as.matrix will convert the vector to n * 1 matrix (column vector)
# so cbind them into one matrix 
newmat = do.call(cbind, newmat.list)
rownames(x = newmat) <- all.features
colnames(x = newmat) <- colnames(x = peak.matrix)
return(newmat)

I'm not sure whether this method is time-consuming or not, but it does save memory and avoid the problem too large error.

satijalab commented 4 years ago

This seems like a reasonable solution - we will implement if it does not affect functionality (@timoast)

timoast commented 4 years ago

This is now available on the develop branch as the keep.sparse parameter (c25ee5feb9bec7d66653741a93fd1f539ec94e78). Keep in mind that maintaining the matrix as a sparse matrix will be significantly slower than using a dense matrix for this function, which is why we did not write it originally to use a sparse matrix.

sscien commented 2 years ago

Hi, I'm having a hardtime understanding how to use keep.sparse. Would this also be applicable to the merge process? I tried to merge 2 million cells many times. No matter how much MEM I increased (up to 2.5 T), it kept give me the problem too large error.

sscien commented 2 years ago

Could you specify how to use keep.sparse? Is it available in Seurat 4.1.1? Thanks!

sscien commented 2 years ago

Could you specify how to use keep.sparse? Is it available in Seurat 4.1.1? Thanks!