Multiple cores don't reduce running time

ingobulla commented 5 years ago

Hi,

I'm trying to use multiple cores when running dmrseq but the number of cores I'm using doesn't influence the running time. My test data set contains about 400 contigs and the running time is about 75 seconds in any case.

When executing the command

regions <- dmrseq(bs = bs, cutoff = 0.1, testCovariate = "CellType")

and having used

register(MulticoreParam(n))

upfront, I get the following output of dmrseq, indicating that it tries to use the correct number of cores

# n = 1 Using a single core (backend: BiocParallel:MulticoreParam). Computing on 1 chromosome(s) at a time.

# n = 2 Parallelizing using 2 workers/cores (backend: BiocParallel:MulticoreParam). Computing on 1 chromosome(s) at a time.

# n = 4 Parallelizing using 4 workers/cores (backend: BiocParallel:MulticoreParam). Computing on 1 chromosome(s) at a time.

I'm a bit irritated by the message about computing on 1 chromosome at a time, but this means 1 chromosome at a time per core, or does it?

In any case, do you have an idea what I'm doing wrong?

kdkorthauer commented 5 years ago

Hi @ingobulla,

There are many factors that can influence the running times of code that uses multiple cores. Although it would be nice, we don't always get (or expect) a linear speedup when using more cores. For one, there is a computational startup cost to setting up multiple cores, so there is a tradeoff between the number of computations to be performed and the number of cores.

I'm not sure what you mean by your dataset containing 400 "contigs" - do you mean 400 loci? If so, that is a very small dataset and if you are seeing no gains in using multiple threads, it is likely that the system time for initializing the threads >> system time for computing on the set of 400 loci.

I'm including a reprex below that shows you can get a speedup in computation time as the number of cores increases. The test dataset has 200,000 loci. You can try it out to verify that you see faster runtimes with multiple cores.

The number of chromosomes computed on at a time is actually something you can change. It is controlled by the argument chrsPerChunk, which is documented in the dmrseq function. This does not change the number of cores that are used, but rather the number of times the cores are initialized. The default value is 1 - meaning the data will be operated on 1 chromosome at a time (and cores defined by the backend will be set up for each one). For example, if you have 5 chromosomes and are using 2 cores, the dmrseq function will loop through each of the 5 chromosomes, and perform computations using 2 cores for each one.

You may see a speedup if you increase this argument, so that multiple chromosomes are operated on at a time - this will mean that there will be less overhead in spawning cores. However, there is a tradeoff here with memory, since operating on a larger set of the data will require more memory resources. Since bisulfite-seq data is typically quite large, the default settings are aimed at lowering the memory requirements by computing on a single chromosome at a time (reducing the size of the matrices kept in memory). If memory is not a limiting factor on your system or with your dataset at hand, you can certainly change this parameter.

Best, Keegan

suppressPackageStartupMessages(library(dmrseq))
library(BiocParallel)

# load example data
data(BS.chr21)

# subset of indices to test
set.seed(293)
idx <- sort(sample(1:length(BS.chr21), 200000))

# Using a single core
register(MulticoreParam(1))
s1 <- proc.time()
reg <- dmrseq(bs=BS.chr21[idx,], testCovariate = 'CellType',
              smooth = FALSE)
#> Assuming the test covariate CellType is a factor.
#> Condition: imr90 vs h1
#> Using a single core (backend: BiocParallel:MulticoreParam).
#> Computing on 1 chromosome(s) at a time.
#> Detecting candidate regions with coefficient larger than 0.1 in magnitude.
#> ...Chromosome chr21: 5004 regions scored (2.35 min).
#> * 5004 candidates detected
#> Performing balanced permutations of condition across samples to generate a null distribution of region test statistics
#>
#> Beginning permutation 1
#> ...Chromosome chr21: 192 regions scored (0.13 min).
#> * 1 out of 2 permutations completed (192 null candidates)
#>
#> Beginning permutation 2
#> ...Chromosome chr21: 263 regions scored (0.16 min).
#> * 2 out of 2 permutations completed (263 null candidates)
e1 <- proc.time()

# Using two cores
register(MulticoreParam(2))
s2 <- proc.time()
reg <- dmrseq(bs=BS.chr21[idx,], testCovariate = 'CellType',
              smooth = FALSE)
#> Assuming the test covariate CellType is a factor.
#> Condition: imr90 vs h1
#> Parallelizing using 2 workers/cores (backend: BiocParallel:MulticoreParam).
#> Computing on 1 chromosome(s) at a time.
#>
#> Detecting candidate regions with coefficient larger than 0.1 in magnitude.
#> ...Chromosome chr21: 5004 regions scored (1.3 min).
#> * 5004 candidates detected
#> Performing balanced permutations of condition across samples to generate a null distribution of region test statistics
#>
#> Beginning permutation 1
#> ...Chromosome chr21: 192 regions scored (0.13 min).
#> * 1 out of 2 permutations completed (192 null candidates)
#>
#> Beginning permutation 2
#> ...Chromosome chr21: 263 regions scored (0.15 min).
#> * 2 out of 2 permutations completed (263 null candidates)
e2 <- proc.time()

# Using four cores
register(MulticoreParam(4))
s4 <- proc.time()
reg <- dmrseq(bs=BS.chr21[idx,], testCovariate = 'CellType',
              smooth = FALSE)
#> Assuming the test covariate CellType is a factor.
#> Condition: imr90 vs h1
#> Parallelizing using 4 workers/cores (backend: BiocParallel:MulticoreParam).
#> Computing on 1 chromosome(s) at a time.
#>
#> Detecting candidate regions with coefficient larger than 0.1 in magnitude.
#> ...Chromosome chr21: 5004 regions scored (0.7 min).
#> * 5004 candidates detected
#> Performing balanced permutations of condition across samples to generate a null distribution of region test statistics
#>
#> Beginning permutation 1
#> ...Chromosome chr21: 192 regions scored (0.1 min).
#> * 1 out of 2 permutations completed (192 null candidates)
#>
#> Beginning permutation 2
#> ...Chromosome chr21: 263 regions scored (0.13 min).
#> * 2 out of 2 permutations completed (263 null candidates)
e4 <- proc.time()

# print running times
print(e1-s1)
#>    user  system elapsed
#> 163.863   3.060 167.331
print(e2-s2)
#>    user  system elapsed
#>  14.702   1.133 103.724
print(e4-s4)
#>    user  system elapsed
#>  14.134   1.494  65.110

^{Created on 2019-03-06 by the reprex package (v0.2.1)}

Session info

``` r devtools::session_info() #> ─ Session info ────────────────────────────────────────────────────────── #> setting value #> version R version 3.5.1 (2018-07-02) #> os CentOS Linux 7 (Core) #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz EST5EDT,M3.2.0,M11.1.0 #> date 2019-03-06 #> #> ─ Packages ────────────────────────────────────────────────────────────── #> package * version date lib source #> AnnotationDbi 1.44.0 2018-10-30 [1] Bioconductor #> AnnotationHub 2.14.4 2019-02-19 [1] Bioconductor #> annotatr 1.8.0 2018-10-30 [1] Bioconductor #> assertthat 0.2.0 2017-04-11 [1] CRAN (R 3.5.0) #> backports 1.1.3 2018-12-14 [1] CRAN (R 3.5.1) #> bibtex 0.4.2 2017-06-30 [1] CRAN (R 3.5.0) #> Biobase * 2.42.0 2018-10-30 [1] Bioconductor #> BiocGenerics * 0.28.0 2018-10-30 [1] Bioconductor #> BiocManager 1.30.4 2018-11-13 [1] CRAN (R 3.5.1) #> BiocParallel * 1.16.6 2019-02-10 [1] Bioconductor #> biomaRt 2.38.0 2018-10-30 [1] Bioconductor #> Biostrings 2.50.2 2019-01-03 [1] Bioconductor #> bit 1.1-14 2018-05-29 [1] CRAN (R 3.5.0) #> bit64 0.9-7 2017-05-08 [1] CRAN (R 3.5.0) #> bitops 1.0-6 2013-08-17 [1] CRAN (R 3.5.0) #> blob 1.1.1 2018-03-25 [1] CRAN (R 3.5.0) #> BSgenome 1.50.0 2018-10-30 [1] Bioconductor #> bsseq * 1.18.0 2018-10-30 [1] Bioconductor #> bumphunter 1.24.5 2018-12-01 [1] Bioconductor #> callr 3.1.1 2018-12-21 [1] CRAN (R 3.5.1) #> cli 1.0.1 2018-09-25 [1] CRAN (R 3.5.0) #> codetools 0.2-15 2016-10-05 [2] CRAN (R 3.5.1) #> colorspace 1.4-0 2019-01-13 [1] CRAN (R 3.5.1) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.0) #> data.table 1.12.0 2019-01-13 [1] CRAN (R 3.5.1) #> DBI 1.0.0 2018-05-02 [1] CRAN (R 3.5.0) #> DelayedArray * 0.8.0 2018-10-30 [1] Bioconductor #> DelayedMatrixStats 1.4.0 2018-10-30 [1] Bioconductor #> desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.0) #> devtools 2.0.1 2018-10-26 [1] CRAN (R 3.5.0) #> digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.0) #> dmrseq * 1.2.3 2019-02-16 [1] Bioconductor #> doRNG 1.7.1 2018-06-22 [1] CRAN (R 3.5.0) #> dplyr 0.8.0.1 2019-02-15 [1] CRAN (R 3.5.1) #> evaluate 0.13 2019-02-12 [1] CRAN (R 3.5.1) #> foreach 1.4.4 2017-12-12 [1] CRAN (R 3.5.0) #> fs 1.2.6 2018-08-23 [1] CRAN (R 3.5.0) #> GenomeInfoDb * 1.18.2 2019-02-12 [1] Bioconductor #> GenomeInfoDbData 1.2.0 2018-11-01 [1] Bioconductor #> GenomicAlignments 1.18.1 2019-01-04 [1] Bioconductor #> GenomicFeatures 1.34.3 2019-01-28 [1] Bioconductor #> GenomicRanges * 1.34.0 2018-10-30 [1] Bioconductor #> ggplot2 3.1.0 2018-10-25 [1] CRAN (R 3.5.0) #> glue 1.3.0 2018-07-17 [1] CRAN (R 3.5.0) #> gtable 0.2.0 2016-02-26 [1] CRAN (R 3.5.0) #> gtools 3.8.1 2018-06-26 [1] CRAN (R 3.5.0) #> HDF5Array 1.10.1 2018-12-05 [1] Bioconductor #> highr 0.7 2018-06-09 [1] CRAN (R 3.5.0) #> hms 0.4.2 2018-03-10 [1] CRAN (R 3.5.0) #> htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.0) #> httpuv 1.4.5.1 2018-12-18 [1] CRAN (R 3.5.1) #> httr 1.4.0 2018-12-11 [1] CRAN (R 3.5.1) #> interactiveDisplayBase 1.20.0 2018-10-30 [1] Bioconductor #> IRanges * 2.16.0 2018-10-30 [1] Bioconductor #> iterators 1.0.10 2018-07-13 [1] CRAN (R 3.5.0) #> knitr 1.21 2018-12-10 [1] CRAN (R 3.5.1) #> later 0.8.0 2019-02-11 [1] CRAN (R 3.5.1) #> lattice 0.20-35 2017-03-25 [2] CRAN (R 3.5.1) #> lazyeval 0.2.1 2017-10-29 [1] CRAN (R 3.5.0) #> limma 3.38.3 2018-12-02 [1] Bioconductor #> locfit 1.5-9.1 2013-04-20 [1] CRAN (R 3.5.0) #> magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.0) #> Matrix 1.2-14 2018-04-13 [2] CRAN (R 3.5.1) #> matrixStats * 0.54.0 2018-07-23 [1] CRAN (R 3.5.0) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.0) #> mime 0.6 2018-10-05 [1] CRAN (R 3.5.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.0) #> nlme 3.1-137 2018-04-07 [2] CRAN (R 3.5.1) #> outliers 0.14 2011-01-24 [1] CRAN (R 3.5.0) #> permute 0.9-4 2016-09-09 [1] CRAN (R 3.5.0) #> pillar 1.3.1 2018-12-15 [1] CRAN (R 3.5.1) #> pkgbuild 1.0.2 2018-10-16 [1] CRAN (R 3.5.0) #> pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.0) #> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.0) #> pkgmaker 0.27 2018-05-25 [1] CRAN (R 3.5.0) #> plyr 1.8.4 2016-06-08 [1] CRAN (R 3.5.0) #> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.0) #> processx 3.2.1 2018-12-05 [1] CRAN (R 3.5.1) #> progress 1.2.0 2018-06-14 [1] CRAN (R 3.5.0) #> promises 1.0.1 2018-04-13 [1] CRAN (R 3.5.0) #> ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.1) #> purrr 0.3.1 2019-03-03 [1] CRAN (R 3.5.1) #> R.methodsS3 1.7.1 2016-02-16 [1] CRAN (R 3.5.0) #> R.oo 1.22.0 2018-04-22 [1] CRAN (R 3.5.0) #> R.utils 2.8.0 2019-02-14 [1] CRAN (R 3.5.1) #> R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.1) #> RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 3.5.0) #> Rcpp 1.0.0 2018-11-07 [1] CRAN (R 3.5.1) #> RCurl 1.95-4.12 2019-03-04 [1] CRAN (R 3.5.1) #> readr 1.3.1 2018-12-21 [1] CRAN (R 3.5.1) #> regioneR 1.14.0 2018-10-30 [1] Bioconductor #> registry 0.5-1 2019-03-05 [1] CRAN (R 3.5.1) #> remotes 2.0.2 2018-10-30 [1] CRAN (R 3.5.0) #> reshape2 1.4.3 2017-12-11 [1] CRAN (R 3.5.0) #> rhdf5 2.26.2 2019-01-02 [1] Bioconductor #> Rhdf5lib 1.4.2 2018-12-03 [1] Bioconductor #> rlang 0.3.1 2019-01-08 [1] CRAN (R 3.5.1) #> rmarkdown 1.11 2018-12-08 [1] CRAN (R 3.5.1) #> rngtools 1.3.1 2018-05-15 [1] CRAN (R 3.5.0) #> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.0) #> Rsamtools 1.34.1 2019-01-31 [1] Bioconductor #> RSQLite 2.1.1 2018-05-06 [1] CRAN (R 3.5.0) #> rtracklayer 1.42.2 2019-03-01 [1] Bioconductor #> S4Vectors * 0.20.1 2018-11-09 [1] Bioconductor #> scales 1.0.0 2018-08-09 [1] CRAN (R 3.5.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.0) #> shiny 1.2.0 2018-11-02 [1] CRAN (R 3.5.0) #> stringi 1.3.1 2019-02-13 [1] CRAN (R 3.5.1) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.5.1) #> SummarizedExperiment * 1.12.0 2018-10-30 [1] Bioconductor #> testthat 2.0.1 2018-10-13 [1] CRAN (R 3.5.0) #> tibble 2.0.1 2019-01-12 [1] CRAN (R 3.5.1) #> tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.0) #> usethis 1.4.0 2018-08-14 [1] CRAN (R 3.5.0) #> withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.0) #> xfun 0.5 2019-02-20 [1] CRAN (R 3.5.1) #> XML 3.98-1.18 2019-03-04 [1] CRAN (R 3.5.1) #> xtable 1.8-3 2018-08-29 [1] CRAN (R 3.5.0) #> XVector 0.22.0 2018-10-30 [1] Bioconductor #> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.0) #> zlibbioc 1.28.0 2018-10-30 [1] Bioconductor #> #> [1] /n/home06/kkorthauer/apps/R-release #> [2] /n/sw/helmod/apps/centos7/Core/R_core/3.5.1-fasrc01/lib64/R/library ```

ingobulla commented 5 years ago

I'm not sure what you mean by your dataset containing 400 "contigs" - do you mean 400 loci? If so, that is a very small dataset and if you are seeing no gains in using multiple threads, it is likely that the system time for initializing the threads >> system time for computing on the set of 400 loci.

Sorry, didn't notice that this would not be clear outside my field: We are working on some oyster for which no assembled genome exists. That is, we don't have any chromosomes but merely tens of thousands of contigs. So, my test data set contains 400 contigs, having 5000 methylation loci.

The number of chromosomes computed on at a time is actually something you can change. It is controlled by the argument chrsPerChunk, which is documented in the dmrseq function. This does not change the number of cores that are used, but rather the number of times the cores are initialized. The default value is 1 - meaning the data will be operated on 1 chromosome at a time (and cores defined by the backend will be set up for each one). For example, if you have 5 chromosomes and are using 2 cores, the dmrseq function will loop through each of the 5 chromosomes, and perform computations using 2 cores for each one.

You may see a speedup if you increase this argument, so that multiple chromosomes are operated on at a time - this will mean that there will be less overhead in spawning cores. However, there is a tradeoff here with memory, since operating on a larger set of the data will require more memory resources. Since bisulfite-seq data is typically quite large, the default settings are aimed at lowering the memory requirements by computing on a single chromosome at a time (reducing the size of the matrices kept in memory). If memory is not a limiting factor on your system or with your dataset at hand, you can certainly change this parameter.

Setting chrsPerChunk to the number of contigs reduced the running time from 75 to 25 seconds on my test data set (using 6 cores). So, we will now try to improve the speed on one of the real data sets (on which the running time was about a week) using chrsPerChunk.

Thanks for your help!

kdkorthauer commented 5 years ago

Ah, that makes sense. Yes, it seems dmrseq was treating the contigs as chromosomes. Glad to hear you were able to reduce the running time on the test dataset. Please don't hesitate to reach out if you run into any more issues.

Best, Keegan

kdkorthauer / dmrseq

Multiple cores don't reduce running time #21