wdl2459 / ConQuR

Batch effects removal for microbiome data via conditional quantile regression
GNU General Public License v3.0
27 stars 4 forks source link

Parallelization does not appear to work #20

Open gregpoore opened 1 year ago

gregpoore commented 1 year ago

Hi there,

I've been trying out ConQurRand like what I see so far. One challenge that I've run into is that the parallelization seems to either not work or be minimally effective, and this makes it very difficult to apply the tool to larger datasets. For example, adapting some code from the tutorial to compare runtime on 2 cores vs. 8 cores vs. 10 cores (on a Macbook Pro):

library(ConQuR)
library(doParallel) 
library(tictoc)

data(Sample_Data)
batchid = Sample_Data[, 'batchid']
covar = Sample_Data[, c('sex', 'race', 'age', 'sbp')]

tic("ConQuR with 2 cores")
taxa_corrected = ConQuR(num_core = 2,
                         tax_tab=taxa, batchid=batchid, 
                         covariates=covar, batch_ref="0")
toc() # --> ConQuR with 2 cores: 30.613 sec elapsed

tic("ConQuR with 8 cores")
taxa_corrected = ConQuR(num_core = 8,
                         tax_tab=taxa, batchid=batchid, 
                         covariates=covar, batch_ref="0")
toc() # --> ConQuR with 8 cores: 28.187 sec elapsed

tic("ConQuR with 10 cores")
taxa_corrected = ConQuR(num_core = 10,
                        tax_tab=taxa, batchid=batchid, 
                        covariates=covar, batch_ref="0")
toc() # --> ConQuR with 8 cores: 28.071 sec elapsed

I've made sure that no other special packages are loaded when running the above, and a gain of ~6% speed-up for 5X as many cores is perplexing.

Do you know if/why parallelization is not speeding things up, and how it can be fixed?

gregpoore commented 1 year ago

@wdl2459 Sorry to bump this, but is there a fix or alternative? I appreciate your insight on how to proceed

gregpoore commented 1 year ago

@wdl2459 It appears that someone else who forked the repo already fixed this. The elapsed times are now:

If it helps, that repo is here: https://github.com/ivartb/ConQuR_par/tree/main/R . I think the main error is that the parallelization should be called by %dopar% rather than %do% on Line 69 (etc.) on the ConQuR_main_tune_libsize.R script