mixOmicsTeam / mixOmics

Development repository for the Bioconductor package 'mixOmics '
http://mixomics.org/
157 stars 52 forks source link

Parallelisation is not working in perf() #292

Open mvacher opened 1 year ago

mvacher commented 1 year ago

🐞 Describe the bug:

perf() is not using multicore processing even with BPPARAM set correctly.


πŸ” reprex results from reproducible example including sessioninfo():

library(mixOmics) 
library(dplyr)
library(BiocParallel)
library(microbenchmark)

## -------------------------------------------------------------------------------------------------------------------
data(breast.TCGA) # load in the data
data = list(miRNA = breast.TCGA$data.train$mirna, # set a list of all the X dataframes
            mRNA = breast.TCGA$data.train$mrna,
            proteomics = breast.TCGA$data.train$protein)

Y = breast.TCGA$data.train$subtype # set the response variable as the Y dataframe

## -------------------------------------------------------------------------------------------------------------------
design = matrix(0.1, ncol = length(data), 
                nrow = length(data), # for square matrix filled with 0.1s
                dimnames = list(names(data), names(data)))
diag(design) = 0 # set diagonal to 0s

basic.diablo.model = block.splsda(X = data, Y = Y, ncomp = 5, design = design) # form basic DIABLO

## -------------------------------------------------------------------------------------------------------------------
# Benchmark
n_rep = 1
res <- list(
  "MulticoreParam(10)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                       folds = 10, nrepeat = 10,
                                       progressBar=FALSE,
                                       BPPARAM=MulticoreParam(workers = 10)), 
                                  times = n_rep), 
  "MulticoreParam(5)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                             folds = 10, nrepeat = 10,
                                             progressBar=FALSE,
                                             BPPARAM=MulticoreParam(workers = 5)), 
                                        times = n_rep),
  "MulticoreParam(2)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                            folds = 10, nrepeat = 10,
                                            progressBar=FALSE,
                                            BPPARAM=MulticoreParam(workers = 2)), 
                                       times = n_rep),
  "SnowParam(10)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                            folds = 10, nrepeat = 10,
                                            progressBar=FALSE,
                                            BPPARAM=BiocParallel::SnowParam(workers = 10)), 
                                       times = n_rep),
  "SnowParam(5)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                            folds = 10, nrepeat = 10,
                                            progressBar=FALSE,
                                            BPPARAM=BiocParallel::SnowParam(workers = 5)), 
                                       times = n_rep),
  "SnowParam(2)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                            folds = 10, nrepeat = 10,
                                            progressBar=FALSE,
                                            BPPARAM=BiocParallel::SnowParam(workers = 2)), 
                                       times = n_rep),
  "SerialParam(1)" = microbenchmark(perf(basic.diablo.model, validation = 'Mfold', 
                                            folds = 10, nrepeat = 10,
                                            progressBar=FALSE,
                                            BPPARAM=SerialParam()), 
                                       times = n_rep))
bind_rows(res)

Results:

Unit: seconds
                                expr      min       lq     mean   median       uq
BPPARAM = MulticoreParam(workers = 10)  25.17865 25.17865 25.17865 25.17865 25.17865
BPPARAM = MulticoreParam(workers = 5)   25.37876 25.37876 25.37876 25.37876 25.37876
BPPARAM = MulticoreParam(workers = 2)   25.19722 25.19722 25.19722 25.19722 25.19722
BPPARAM = SnowParam(workers = 10))      25.45244 25.45244 25.45244 25.45244 25.45244
BPPARAM = SnowParam(workers = 5))       25.81489 25.81489 25.81489 25.81489 25.81489
BPPARAM = SnowParam(workers = 2))       25.91184 25.91184 25.91184 25.91184 25.91184
BPPARAM = SerialParam())                25.55273 25.55273 25.55273 25.55273 25.55273

sessionInfo()

R version 4.2.3 (2023-03-15)
Platform: aarch64-apple-darwin22.3.0 (64-bit)
Running under: macOS Ventura 13.0

Matrix products: default
LAPACK: /opt/homebrew/Cellar/r/4.2.3/lib/R/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4.9 BiocParallel_1.32.5  dplyr_1.1.1          mixOmics_6.22.0      ggplot2_3.4.1        lattice_0.20-45        

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.10.0 tidyr_1.3.0           jsonlite_1.8.4        ellipse_0.4.4         stats4_4.2.3          yaml_2.3.7            ggrepel_0.9.3         corrplot_0.92        
 [9] pillar_1.9.0          glue_1.6.2            reticulate_1.28       digest_0.6.31         RColorBrewer_1.1-3    colorspace_2.1-0      cowplot_1.1.1         htmltools_0.5.5      
[17] Matrix_1.5-3          plyr_1.8.8            pkgconfig_2.0.3       pheatmap_1.0.12       dir.expiry_1.6.0      purrr_1.0.1           corpcor_1.6.10        scales_1.2.1         
[25] HDF5Array_1.26.0      RSpectra_0.16-1       Rtsne_0.16            tibble_3.2.1          generics_0.1.3        IRanges_2.32.0        withr_2.5.0           BiocGenerics_0.44.0  
[33] cli_3.6.1             magrittr_2.0.3        evaluate_0.20         fansi_1.0.4           forcats_1.0.0         tools_4.2.3           lifecycle_1.0.3       matrixStats_0.63.0   
[41] basilisk.utils_1.10.0 stringr_1.5.0         Rhdf5lib_1.20.0       S4Vectors_0.36.2      munsell_0.5.0         DelayedArray_0.24.0   compiler_4.2.3        rlang_1.1.0          
[49] rhdf5_2.42.0          grid_4.2.3            rhdf5filters_1.10.0   rstudioapi_0.14       igraph_1.4.1          rmarkdown_2.21        basilisk_1.10.2       gtable_0.3.3         
[57] codetools_0.2-19      rARPACK_0.11-0        reshape2_1.4.4        R6_2.5.1              gridExtra_2.3         knitr_1.42            fastmap_1.1.1         uwot_0.1.14          
[65] utf8_1.2.3            filelock_1.0.2        stringi_1.7.12        parallel_4.2.3        Rcpp_1.0.10           vctrs_0.6.1           png_0.1-8             tidyselect_1.2.0     
[73] xfun_0.38  

πŸ€” Expected behavior:

Decreasing running time when using multiple cores.


πŸ’‘ Possible solution:

Not sure, the example provided uses an block.splsda object because it is what I was interested in but looking at the code, it seems that the use of BiocParallel is not consistent across all the perf() variants.

The problem looks similar to what was previously reported about lack of parallelisation in the tune() function in #214