mixOmicsTeam / mixOmics

Development repository for the Bioconductor package 'mixOmics '
http://mixomics.org/
159 stars 52 forks source link

No parallelization in tune.spls #214

Closed EmileFaure closed 1 year ago

EmileFaure commented 2 years ago

🐞 Describe the bug:

tune.spls is not using multicore processing even with BPPARAM set correctly.


πŸ” reprex results from reproducible example including sessioninfo():

Monitoring of this code from the package vignette:

param = MulticoreParam(workers=5)
bpstart(param)
data(liver.toxicity)
X <- liver.toxicity$gene
Y <- liver.toxicity$clinic
set.seed(42)
tune.res = tune.spls( X, Y, ncomp = 3,
                      test.keepX = c(5, 10, 15),
                      test.keepY = c(3, 6, 8), measure = "cor",
                      folds = 5, nrepeat = 3, progressBar = TRUE, BPPARAM = param)

--> Only one CPU is used

tune.res = tune.spca( X, ncomp = 3,
                      test.keepX = c(5, 10, 15),
                      folds = 5, nrepeat = 3, BPPARAM = param)

--> Multiple CPUs are used in parallel.

Session info:

R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mixOmics_6.18.1     ggplot2_3.3.5       lattice_0.20-45     MASS_7.3-57        
[5] BiocParallel_1.28.3

loaded via a namespace (and not attached):
 [1] ggrepel_0.9.1        Rcpp_1.0.8.3         lubridate_1.8.0      tidyr_1.2.0         
 [5] corpcor_1.6.10       listenv_0.8.0        class_7.3-20         snow_0.4-4          
 [9] assertthat_0.2.1     digest_0.6.29        ipred_0.9-12         foreach_1.5.2       
[13] utf8_1.2.2           RSpectra_0.16-1      parallelly_1.31.1    R6_2.5.1            
[17] plyr_1.8.7           hardhat_0.2.0        stats4_4.1.2         ellipse_0.4.2       
[21] pillar_1.7.0         rlang_1.0.2          caret_6.0-92         rstudioapi_0.13     
[25] data.table_1.14.2    rpart_4.1.16         Matrix_1.4-1         rARPACK_0.11-0      
[29] splines_4.1.2        gower_1.0.0          stringr_1.4.0        igraph_1.3.1        
[33] munsell_0.5.0        tinytex_0.38         compiler_4.1.2       xfun_0.30           
[37] pkgconfig_2.0.3      globals_0.14.0       nnet_7.3-17          tidyselect_1.1.2    
[41] gridExtra_2.3        tibble_3.1.6         prodlim_2019.11.13   matrixStats_0.62.0  
[45] codetools_0.2-18     fansi_1.0.3          future_1.25.0        crayon_1.5.1        
[49] dplyr_1.0.9          withr_2.5.0          recipes_0.2.0        ModelMetrics_1.2.2.2
[53] grid_4.1.2           nlme_3.1-157         gtable_0.3.0         lifecycle_1.0.1     
[57] DBI_1.1.2            magrittr_2.0.3       pROC_1.18.0          scales_1.2.0        
[61] future.apply_1.9.0   cli_3.3.0            stringi_1.7.6        reshape2_1.4.4      
[65] timeDate_3043.102    ellipsis_0.3.2       generics_0.1.2       vctrs_0.4.1         
[69] lava_1.6.10          RColorBrewer_1.1-3   iterators_1.0.14     tools_4.1.2         
[73] glue_1.6.2           purrr_0.3.4          parallel_4.1.2       survival_3.3-1      
[77] colorspace_2.0-3     BiocManager_1.30.17 

πŸ€” Expected behavior:

tune.spls should be parallelized over the workers registered in the BiocParallel environment.


πŸ’‘ Possible solution:

In source code of tune.spca, the call of iterations over the keeping of X is made through bplapply: test.keepX.cors <- bplapply(X=all.keepX, FUN=iter_keepX, BPPARAM = BPPARAM)

In the one of tune.spls, there seem to be no use of any of the basic functions of BiocParallel. It looks to me as the presence of the BiocParallel backend is only playing a role on whether the progress bar is printed or not. If I'm wrong, do you have any idea of why this could happen on my system ? Thank you very much !

Max-Bladen commented 2 years ago

Thank you for your post @EmileFaure. What you've noted is totally correct. After I spent a bit of time exploring the source code, I've realised that this issue a bit larger than just a lack of parallelisation in tune.spls(). If you're interested, feel free to read the information in Issue #216.

As far as it should concern you, I will attempt to rectify the lack of any parallelisation in tune.spls() prior to any other fixes/enhancements.

Max-Bladen commented 2 years ago

G'day @EmileFaure. Thought I'd let you know that I've implemented a fix for this issue and that it is accessible on branch issue-214. Please let me know if there are any issues you run into