Report a bug in parallel computation

chengwenxuan1997 commented 2 years ago

Thank you for this powerful tool, but I came across a problem when calling DMRs. When using the function "dmrseq", more cores could not speed up the computation. Inversely, the more cores I used, the longer time it consumed.

For example, when the nworkers was set to 2, it cost 0.44 min for permutation once, and 85s overall:

library(dmrseq) data(BS.chr21) testCovariate <- 'CellType'

register(BPPARAM = SnowParam(workers = 2)) Sys.time() [1] "2022-04-30 10:39:23 CST" regions <- dmrseq(bs=BS.chr21[240001:250000,], cutoff = 0.05, testCovariate=testCovariate)

Assuming the test covariate CellType is a factor. Condition: imr90 vs h1 Parallelizing using 2 workers/cores (backend: BiocParallel:SnowParam). Computing on 1 chromosome(s) at a time.

Detecting candidate regions with coefficient larger than 0.05 in magnitude. ...Chromosome chr21: Smoothed (0 min). 192 regions scored (0.5 min).

192 candidates detected Performing balanced permutations of condition across samples to generate a null distribution of region test statistics

Beginning permutation 1 ...Chromosome chr21: Smoothed (0 min). 76 regions scored (0.44 min).

1 out of 2 permutations completed (76 null candidates)

Beginning permutation 2 ...Chromosome chr21: Smoothed (0 min). 73 regions scored (0.44 min).

2 out of 2 permutations completed (73 null candidates) Warning messages: 1: In serialize(data, node$con) : 'package:stats' may not be available when loading 2: In serialize(data, node$con) : 'package:stats' may not be available when loading 3: In serialize(data, node$con) : 'package:stats' may not be available when loading 4: In serialize(data, node$con) : 'package:stats' may not be available when loading 5: In serialize(data, node$con) : 'package:stats' may not be available when loading 6: In serialize(data, node$con) : 'package:stats' may not be available when loading Sys.time() [1] "2022-04-30 10:40:48 CST"

However, when the nworkers was set to 4, it cost 0.78 min for permutation once, and 146s overall:

register(BPPARAM = SnowParam(workers = 4)) Sys.time() [1] "2022-04-30 10:43:23 CST" regions <- dmrseq(bs=BS.chr21[240001:250000,], cutoff = 0.05, testCovariate=testCovariate) Assuming the test covariate CellType is a factor. Condition: imr90 vs h1 Parallelizing using 4 workers/cores (backend: BiocParallel:SnowParam). Computing on 1 chromosome(s) at a time.

Detecting candidate regions with coefficient larger than 0.05 in magnitude. ...Chromosome chr21: Smoothed (0 min). 192 regions scored (0.84 min).

192 candidates detected Performing balanced permutations of condition across samples to generate a null distribution of region test statistics

Beginning permutation 1 ...Chromosome chr21: Smoothed (0 min). 76 regions scored (0.78 min).

1 out of 2 permutations completed (76 null candidates)

Beginning permutation 2 ...Chromosome chr21: Smoothed (0 min). 73 regions scored (0.76 min).

2 out of 2 permutations completed (73 null candidates) There were 12 warnings (use warnings() to see them) Sys.time() [1] "2022-04-30 10:45:49 CST"

kdkorthauer commented 2 years ago

Hi @chengwenxuan1997,

Thanks for reporting this. I don't believe it is a bug, however. Using multiple cores does not automatically translate to a proportional speedup of computation time, unfortunately. In parallel computing, there is always some overhead -- the cost in setting up and transferring to multiple cores. The tradeoff between adding cores will depend on how much computation is done on each core, among other things. It looks like you are running this on a small-scale example, where relatively few computations are done on each core (e.g. only chromosome 21). As you scale up (e.g. full genome), I expect you will see gains when multiple cores are used compared to only one.

Best, Keegan

chengwenxuan1997 commented 2 years ago

Thank you for replying soon. I run dmrseq on a one million WGBS sites dataset using 1, 3, 5, 8, 10, 15, and 20 threads. The speed of parallel computation overperformed the single thread indeed when the number of threads is more than 5. Perhaps the threshold is decided by the dataset size, but it is clear that the parallel computation does work. Thanks very much.

kdkorthauer / dmrseq

Report a bug in parallel computation #49