grunwaldlab / poppr

🌶 An R package for genetic analysis of populations with mixed (clonal/sexual) reproduction
https://grunwaldlab.github.io/poppr
68 stars 26 forks source link

threads not being respected and defaulting to 1 #249

Open pdimens opened 3 years ago

pdimens commented 3 years ago

Please include a brief description of the problem with a code example:

I have a dataset with which parallelized AMOVA would dramatically speed up runtime. Using threads = 20L for 20 threads (or any value, for that matter) results in only a single thread being used each time when running a script via Rscript.

System

Arch Linux R 4.4.1 poppr 2.9.3:

This is poppr version 2.9.3. To get started, type package?poppr
OMP parallel support: available
# part of the Rscript
amova_results_neut <- poppr.amova(
  bft_neut,
  hier = ~year/population,
  clonecorrect = FALSE,
  within = TRUE,
  squared = TRUE,
  correction = "quasieuclid",
  algorithm = "farthest_neighbor",
  threads = 20L,
  missing = "loci",
  cutoff = 0.1,
  quiet = FALSE,
  method = "pegas",
  nperm = 50000
)

and the bash top command

   PID     USER  NI    RES    SHR S  %CPU  %MEM     TIME+                        COMMAND
156535  pdimens   0 723056  19884 R 100.0   0.3   8:13.11                              R

zkamvar commented 3 years ago

The threads argument refers to the number of cores used to filter the data and/or calculate the distance before it is passed to the AMOVA function in the imported package (in this case, {pegas}). After that, {pegas}' implementation will run serially because they have not yet implemented parallelization (which for several reasons is not so simple to do in a cross-platform way in R).

Since the permutation procedure is buried inside the {pegas} implementation of AMOVA, there's not much I can do to improve that timing. My suggestion would be to request that the author of {pegas} add the possibility for parallelization in the AMOVA function (which should be possible with the {future} package).