Closed eschen42 closed 8 years ago
Could you please provide the output of your sessionInfo
? In the devel branch of xcms
we switched from the old parallel processing setup (which was quite cumbersome in xcms
) to BiocParallel
, i.e. BiocParallel
takes care of the correct parallel processing setup (whether snow
Rmpi
or parallel
are used) which can be configured system-wide.
If you're using the release branch you might still be with the old setup. Eventually you're lacking one of the required packages for parallel processing. Try installing snow
, parallel
, Rmpi
on your windows machine. I can't remember which one, but only one of those works on Windows, so don't be surprised if not all are available or can be installed:
library(BiocInstaller)
biocLite(c("snow", "parallel", "Rmpi"))
sessionInfo() revealed that "parallel" was already attached; as you said, apparently it is not effective for XCMS.
I did find that adding
library("snow") to my script did result in parallel processing, e.g., Starting snow cluster with 12 sockets Detecting features in file # 1: foo.mzXML Detecting features in file # 2: bar.mzXML etc.
Page 81 of the package manual
http://bioconductor.org/packages/release/bioc/manuals/xcms/man/xcms.pdf
documents nSlaves as
nSlaves - number of slaves/cores to be used for parallel peak detection. MPI is used if installed, otherwise the snow package is employed for multicore support. If none of the two packages is available it uses the parallel package for parallel processing on multiple CPUs of the current machine.
Perhaps this could be very slightly more explicit for the naive user, e.g.:
nSlaves - number of slaves/cores to be used for parallel peak detection. Requires at least one of the additional libraries: Rmpi, snow, parallel. If several are loaded, the order of preference is: Rmpi > snow > parallel.
Thank you very much for your quick response!
Thanks for your suggestion. Note however that the use of nSlaves
is deprecated in the next release. As noted above we'll switch to BiocParallel
for parallel processing; I'll try to enhance the documentation and eventually add a specific section to the vignette.
This may be an issue rather than a bug.
I am running R 3.3.1 on a 24 core 64 bit Windows virtual machine. I don't know whether the issue is the Windows build or virtualization, but I don't get multithreaded peak-finding. Indeed, when I was working with someone else with a physical mulit-core machine, we didn't see any performance gain (or change in behavior) when we changed nSlaves.
Today I fetched xcms with source("https://biocondocutor.org/biocLite.R") biocLite("xcms")
No matter what parameters I pass to xcmsSet, and no matter whether I use a GUI or R --vanilla < threadtest.R it always seems to pick peaks one file at a time, e.g.:
By contrast, when I run this under Linux on a two-core, two-thread-per-core physical machine, I get three cores engaged and chromatographic peak detection overlaps with mass trace detection, e.g.: Detecting mass traces at 2.5 ppm ... % finished: 0 10 20 30 40 50 60 70 80 90 100 1300 m/z ROI's.
Detecting chromatographic peaks ... % finished: 0 Detecting mass traces at 2.5 ppm ... % finished: 0 10 20 30 40 50 60 70 80 90 100 4751 m/z ROI's.
Detecting chromatographic peaks ... % finished: 0 10 20 10 30 Detecting mass traces at 2.5 ppm ... % finished: 0 10 20 30 40 50 60 70 80 90 40 100 3703 m/z ROI's.
Detecting chromatographic peaks ... % finished: 0 20 50 60 10 30 70 20 40 80 30 50 60 40 70 50 90 80 90 60 100 315 Peaks. 70 80 90 100 861 Peaks. 100 800 Peaks.
Perhaps there something else that I should try differently. Could this possibly be an issue with the Bioconductor build of XCMS for windows?