navinlabcode / copykat

Other
193 stars 54 forks source link

running copykat with a sparse matrix #27

Closed hartlama closed 3 years ago

hartlama commented 3 years ago

Hello,

I am trying to run copykat on a very large data set (60000 cells). I am inputting a sparse matrix (dgCMatrix), but am still getting a memory error, shown below. How can I get around this?

copykat.test <- copykat(rawmat=exp.rawdata, id.type="S", ngene.chr=5, win.size=25, KS.cut=0.1, sam.name="mIDH", distance="euclidean" [1] "running copykat v1.0.4"s=8) [1] "step1: read and filter data ..." [1] "49056 genes, 62123 cells in raw data" Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105 traceback() 6: asMethod(object) 5: as(x, "matrix") 4: as.matrix.Matrix(X) 3: as.matrix(X) 2: apply(rawmat, 2, function(x) (sum(x > 0))) 1: copykat(rawmat = exp.rawdata, id.type = "S", ngene.chr = 5, win.size = 25, KS.cut = 0.1, sam.name = "mIDH", distance = "euclidean", norm.cell.names = "", n.cores = 8)

gaobio commented 3 years ago

Hello,

I am trying to run copykat on a very large data set (60000 cells). I am inputting a sparse matrix (dgCMatrix), but am still getting a memory error, shown below. How can I get around this?

copykat.test <- copykat(rawmat=exp.rawdata, id.type="S", ngene.chr=5, win.size=25, KS.cut=0.1, sam.name="mIDH", distance="euclidean" [1] "running copykat v1.0.4"s=8) [1] "step1: read and filter data ..." [1] "49056 genes, 62123 cells in raw data" Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105 traceback() 6: asMethod(object) 5: as(x, "matrix") 4: as.matrix.Matrix(X) 3: as.matrix(X) 2: apply(rawmat, 2, function(x) (sum(x > 0))) 1: copykat(rawmat = exp.rawdata, id.type = "S", ngene.chr = 5, win.size = 25, KS.cut = 0.1, sam.name = "mIDH", distance = "euclidean", norm.cell.names = "", n.cores = 8)

Great point to switch to sparse matrix. We may achieve this switch in next version. Still get a chance yet. But but but please run one sample at a time. Combining samples will not work. CopyKat uses internal control as baseline, sample-to-sample variation may be picked up as CNAs when you run multiple samples together.