Closed chengwenxuan1997 closed 2 years ago
Hi @chengwenxuan1997,
Thanks for reporting this. I don't believe it is a bug, however. Using multiple cores does not automatically translate to a proportional speedup of computation time, unfortunately. In parallel computing, there is always some overhead -- the cost in setting up and transferring to multiple cores. The tradeoff between adding cores will depend on how much computation is done on each core, among other things. It looks like you are running this on a small-scale example, where relatively few computations are done on each core (e.g. only chromosome 21). As you scale up (e.g. full genome), I expect you will see gains when multiple cores are used compared to only one.
Best, Keegan
Thank you for replying soon. I run dmrseq on a one million WGBS sites dataset using 1, 3, 5, 8, 10, 15, and 20 threads. The speed of parallel computation overperformed the single thread indeed when the number of threads is more than 5. Perhaps the threshold is decided by the dataset size, but it is clear that the parallel computation does work. Thanks very much.
Thank you for this powerful tool, but I came across a problem when calling DMRs. When using the function "dmrseq", more cores could not speed up the computation. Inversely, the more cores I used, the longer time it consumed.
For example, when the nworkers was set to 2, it cost 0.44 min for permutation once, and 85s overall:
However, when the nworkers was set to 4, it cost 0.78 min for permutation once, and 146s overall: