theislab / destiny

R package for single cell and other data analysis using diffusion maps
https://theislab.github.io/destiny/
GNU General Public License v3.0
70 stars 12 forks source link

About the memory #9

Closed gerrardmai closed 6 years ago

gerrardmai commented 6 years ago

Hello, I want to know, currently, I have about 70 thousands cell, and about 30 predictors, if i need to run DiffusionMap, How much memory is required? Because if I put 70 thousands cell in it, and it reports the error : "problem too large"

Do you have a fumular for memory estimating?

I hope to recieve your reply asap.

Thank you!

flying-sheep commented 6 years ago

what’s the traceback of that error?

gerrardmai commented 6 years ago

Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105 Calls: DiffusionMap ... as.matrix.Matrix -> as -> asMethod -> as -> asMethod -> .Call

gerrardmai commented 6 years ago

BTW, I want to know how to determine the value of censor_value and censor_range?
I found that if i set censor =15, and censor_range=[15,40], it takes less time and no error.

flying-sheep commented 6 years ago

for your second question: please read the DiffusionMap vignette of the package.

for the first one: how many nearest neighbors are you using?

gerrardmai commented 6 years ago

I did not set the k at the first time, use the default value;

1.3 Run diffusion map

dm = DiffusionMap(y,verbose = TRUE)

and I also try the value of k=1000;

1.3 Run diffusion map

dm = DiffusionMap(y,k=1000,verbose = TRUE)

Both of them report the same error; Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105 Calls: DiffusionMap ... as.matrix.Matrix -> as -> asMethod -> as -> asMethod -> .Call

flying-sheep commented 6 years ago

can you try with a smaller k, as low as 15 might work!

gerrardmai commented 6 years ago

I tried k=15; it reports the same error;

> dim(y)
[1] 50000     7
> 
> ## 1.3 Run diffusion map
> dm = DiffusionMap(y,k=15,verbose = TRUE)
finding knns......done. Time: 398.21s
Calculating transition probabilities...Error in asMethod(object) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
Calls: DiffusionMap ... as.matrix.Matrix -> as -> asMethod -> as -> asMethod -> .Call
Timing stopped at: 0.086 0.007 0.239 
Execution halted
flying-sheep commented 6 years ago

can you execute traceback() after the error?

gerrardmai commented 6 years ago

I try but cant. The part of code:

## 1.3 Run diffusion map
> dm = DiffusionMap(y,k=15,verbose = TRUE)
> traceback()

and the report:

## 1.3 Run diffusion map
> dm = DiffusionMap(y,k=15,verbose = TRUE)
finding knns......done. Time: 389.64s
Calculating transition probabilities...Error in asMethod(object) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
Calls: DiffusionMap ... as.matrix.Matrix -> as -> asMethod -> as -> asMethod -> .Call
Timing stopped at: 0.118 0.001 0.122 
Execution halted
flying-sheep commented 6 years ago

report? can you do it in a R shell?

gerrardmai commented 6 years ago

I use cluster to run that code,
30 gpu, 2 52G

R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit)

flying-sheep commented 6 years ago

can you get to an interactive R shell and do it there?

gerrardmai commented 6 years ago

A PART OF IT:

1.3 Run diffusion map

dm = DiffusionMap(y,k=15,verbose = TRUE) finding knns......done. Time: 368.64s Calculating transition probabilities...Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105 Timing stopped at: 0.084 0.002 0.101 traceback() 14: .Call(Csparse_to_matrix, from, FALSE, FALSE) 13: asMethod(object) 12: as(as(from, "generalMatrix"), "matrix") 11: asMethod(object) 10: as(x, "matrix") 9: as.matrix.Matrix(x) 8: as.matrix(x) 7: upper.tri(dists) 6: no_censoring(dists, sigma, cb) 5: force(expr) 4: system.time({ r <- force(expr) }) 3: verbose_timing(verbose, "Calculating transition probabilities", { if (censor) censoring(imputed_data, sigma, dists, censor_val, censor_range, missing_range, cb) else no_censoring(dists, sigma, cb) }) 2: transition_probabilities(imputed_data, sigma, knn$dist_mat, censor, censor_val, censor_range, missing_range, verbose) 1: DiffusionMap(y, k = 15, verbose = TRUE)

flying-sheep commented 6 years ago

I’m seeing 7: upper.tri(dists). This means you’re not using the newest version, as the newest version calls upper.tri.sparse

https://github.com/theislab/destiny/blob/cf6b799002182b9e9bc3144824a44043dae1c33e/R/diffusionmap.r#L351

Please update destiny, then the error will go away.

gerrardmai commented 6 years ago

Lots of help; Thank you so much!

flying-sheep commented 6 years ago

sure! the newest version is in bioconductor, so upgrading should be easy!

janinemelsen commented 4 years ago

Hi,

I am kind of running into the same issue. When working on a HPC cluster, I can run maximum 50.000 cells with 11 parameters and k=1000. Is there a possibility to work in parallel with the diffusionmap function in order to increase the memory?

flying-sheep commented 4 years ago

11 parameters?

Also parallel will increase your memory usage not reduce it. You mean chunked processing, right?

Sadly no, currently. But you can do DiffusionMap(pcaMethods::pca(data, nPCs = 50)) which could help, or use a smaller k.

janinemelsen commented 4 years ago

Yes I have a flowcytometry dataset, so less parameters compared to cytof or single cell rna seq. However, I managed to run 100.000 cells (k=1000),by downloading the newest release of the package on github:)

flying-sheep commented 4 years ago

Great! I’ll add the functionality of doing a PCA first, which should speed things up again.