kdkorthauer / dmrseq

R package for Inference of differentially methylated regions (DMRs) from bisulfite sequencing
MIT License
54 stars 14 forks source link

Running dmreseq using more than 1 core give an error #29

Closed MohamedRefaat92 closed 4 years ago

MohamedRefaat92 commented 4 years ago

Hi,

I'm running dmrseq on a big dataset with 27507595 methylation loci and 25 samples. Running dmrseq using the default following parallelization fails to run: Parallelizing using 54 workers/cores (backend: BiocParallel:MulticoreParam). Computing on 1 chromosome(s) at a time. ..and I get this error: ...Chromosome chr1: Error in serialize(data, node$con, xdr = FALSE) : error writing to connection A solution to this problem that worked for me is to reduce the number of cores to only a single core. But given the size of the data, this takes more than 24 hrs!

I've seen how increasing the number of cores decrease the running time in #21 and it would be great if you could help me overcome that error.

Thanks, Mohamed Shoeb

kdkorthauer commented 4 years ago

Hi @MohamedRefaat92,

Yes, you should be able to use multiple cores. I am guessing you are running out of memory trying to use 54 cores. To see if that's the case, can you see if you get the same error if you specify 2 cores? If that runs successfully, you can try increasing from 2 to a larger number. But eventually, you're going to reach a plateau for how much the additional cores reduce computation time (due to overhead of initializing the cores), as well as run into the memory limit (since each core uses memory).

Hope that helps!

Best, Keegan