xuranw / MuSiC

Multi-subject Single Cell Deconvolution
https://github.com/xuranw/MuSiC
GNU General Public License v3.0
220 stars 83 forks source link

Handle Huge Sparse Matrices #86

Open julvi opened 2 years ago

julvi commented 2 years ago

Hi @xuranw,

The expression matrix of my reference scRNAseq dataset is huge (27804 genes x 118535 cells) and is readable on R as a dgCMatrix object. Unfortunately, the ExpressionSet function cannot handle the dgCMatrix-class:

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘ExpressionSet’ for signature ‘"dgCMatrix"’

And the as.matrix function cannot convert the dgCMatrix object into a normal matrix:

Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102

Do you have a workaround to run MuSiC on huge expression matrices?

sam0per commented 2 years ago

@julvi I am encountering the exact issue, how did you solve it?

chenx9 commented 2 years ago

I have also encountered this problem. Have you solved your problem? @julvi

Jospin-Dk commented 2 years ago

I am currently facing this problem as well. the expression matrix for my scRNAseq reference is (33694 x 92385) is there any workaround to be able to create ExpressionSet object required to run MuSiC?

IStevant commented 1 year ago

Same problem here. There is the SingleCellExperiment package that handle sparse matrices but bot sure it is supported by MuSiC.

FrankStarling commented 1 year ago

Same problem. Any solutions?

saeedfc commented 1 year ago

Hi All,

Just saw this while passing by, I deal it this way (by converting part by part and then stitiching together) Its not the most optimized piece of code. But it does the job.

## x is the large sparse matrix in DgC
## ncol break is the number of columns in each small matrices you make 
## before combining to not give an error due to large size of the original matrix

dGC_to_matrix <- function(x,ncol_break = 49999){

  if(length(colnames(x))>(ncol_break+1)){
    total_cols = length(colnames(x)) ## Total columns in the dgc matrix
    the_seq <- c(seq(1,total_cols,ncol_break), total_cols) ## Make a sequence starting from 
    ## 1 to the total number of columns in steps of 'ncol_breaks'
    the_seq <- unique(the_seq) ## In case the total columns == last element of the_seq, we need to avoid potnetial duplicate
  }
  matrix_list <- list() ## make an empty list to store each part matrix
  total_parts <- length(the_seq)-1 ## Number of poarts is one less than the sequence
  for(i in 1:total_parts){
    start_no = ifelse(i==1,1,the_seq[i]+1) ## Starts with 1, 
    ##but next time it should start with the column after the last column in the last part matrix created
    print(paste0(i, " is i"))
    print(paste0("start_no is", start_no))
    end_no = the_seq[i+1] 
    print(paste0("part_number:", i, ";cols-",start_no,":",end_no))
    matrix_list[[i]] <- as.matrix(x[,start_no:end_no,drop = F])
  }
  return(do.call(cbind, matrix_list)) ### cbind the columns
}

Eg:-
full_mtx <- dGC_to_matrix(full_dgc, 49999)
gevro commented 3 months ago

Since MUSIC2 still uses ExpressionSet as input, and ExpressionSet does not accept sparse dgCMatrix, is there any other way to run MUSIC2 with sparse matrices?