xzhoulab / iDEA

Differential expression (DE); gene set Enrichment Analysis (GSEA); single cell RNAseq studies (scRNAseq)
GNU General Public License v3.0
32 stars 11 forks source link

Not able to increase the number of cores to use #7

Closed aricci-n closed 3 years ago

aricci-n commented 4 years ago

I am trying to use iDEA with my dataset, but unfortunately fitting the model is extremely slow.

When I create the iDEA object, I set the number of cores to 10:

idea <- CreateiDEAObject(iDEA_input, mouseGeneSets_reduced, max_var_beta = 100, min_precent_annot = 0.0025, num_core=10)

However, when I check inside the iDEA object, what I see is:

idea@num_core

##output

[1] 1

As a matter of fact, when I try to fit the model, what I get is:

idea <- iDEA.fit(idea,
                 fit_noGS=FALSE,
             init_beta=NULL, 
             init_tau=c(-2,0.5),
             min_degene=5,
             em_iter=15,
             mcmc_iter=1000, 
             fit.tol=1e-5,
                 modelVariant = F,
             verbose=TRUE)

## ===== iDEA INPUT SUMMARY ==== ##
## number of annotations:  3767 
## number of genes:  11777 
## number of cores:  1 
## fitting the model with gene sets information...

I guess this is why the step is so slow. Any ideas how could I fix this?

Thank you very much in advance!

YingMa0107 commented 4 years ago

Hi @aricci-n

Based on the output, I suspect you are using the Windows system. Actually the package was developed under the linux system originally and also tested on the MacOS and windows successfully. It is true that the program forced it to be 1 on windows, this is because in our package, we used the pbmcapply package to do the parallel computing. However, the parallelization only works on Linux and maxOS system due to the lack of fork() functionality on windows (You can find it in the package description https://cran.r-project.org/web/packages/pbmcapply/pbmcapply.pdf). At this stage, it is more efficient running the package under MacOS or Linux systems. It might take some time for us to find a replacement for parallel computing that works on all systems and all kinds of R versions. If you could run it on MacOS or linux system, that might be the easiest solution.

Best, Ying