rietho / IPO

A Tool for automated Optimization of XCMS Parameters
http://bioconductor.org/packages/IPO/
Other
34 stars 20 forks source link

Unpredictive behaviour with library(Rmpi) installed #27

Closed tobigithub closed 8 years ago

tobigithub commented 8 years ago

Hi, I was successfully running the R-scripts from the original paper supplement (see WIKI) and that gave no issues. However after installing the library(Rmpi) I got the following error:

starting new DoE with:
min_peakwidth:  c(12, 28)       max_peakwidth:  c(35, 65)       ppm:    c(17, 32)       mzdiff: c(-0.001, 0.01) snthresh:       10      noise:  0       prefilter:      3       value_of_prefilter:     100     mzCenterFun:    wMean   integrate:      1       fitgauss:       FALSE   verbose.columns:        FALSE   nSlaves:        1       
Error in mpi.spawn.Rslaves(nslaves = nSlaves, needlog = FALSE) : 
  Spawning is not implemented. Please use mpiexec with Rprofile.
Timing stopped at: 0.44 0.32 1134.28 

I also wonder why it would complain after 18 minutes, instead of telling me that beforehand? Because it correctly spawned 32 rscript processes. Just broke down later. Also I did not set nSlaves = 1. I think I had it set to 32 (no matter if that makes sense or not).

Very surprising behavior.

Cheers Tobias

glibiseller commented 8 years ago

Hi,

Thanks for the information regarding your nSlaves-settings. That gave me a hint where the problem may be.

To not only rely on the estimation given by the response surface model at the end of an DoE an xcmsSet-object is calculated to accurately calculate PPS. This is done using the function calculateXcmsSet(). For best possible parallelization this function calls xcmsSet with nSlaves = params$nSlaves * nSlaves. So although you had params$nSlaves set to 1, the nSlaves from IPO was set to 32. Therefore XCMSs Rmpi parallelization was called using 32 processes.

Although this is mostly likely the code where the error occurs, I don't know why?

Does this example work on your machine?

mtbls2files <- list.files(file.path(find.package("mtbls2"), "mzData"), 
                                    full.names=TRUE)

 params <- list(min_peakwidth=12, max_peakwidth=30, ppm=30,
                   mzdiff=-0.001, snthresh=10, noise=10000, prefilter=3, 
                   value_of_prefilter=100,  mzCenterFun="wMean", integrate=1, 
                   fitgauss=FALSE, verbose.columns=FALSE, nSlaves=2)

xset <- calculateXcmsSet(mtbls2files[1:2], params)

Cheers Gunnar

tobigithub commented 8 years ago

Hi, yes it runs copied from above, this is the output, however there are no child processes spawned, setting nSlaves to 2,4,8 or 666 has no influence when library(Rmpi) is not installed. Also no error is observed, everything runs fine and takes around 20 seconds.

 Detecting mass traces at 30 ppm ... 
 % finished: 0 10 20 30 40 50 60 70 80 90 100 
 217 m/z ROI's.

 Detecting chromatographic peaks ... 
 % finished: 0 10 20 30 40 50 60 70 80 90 100 
 194  Peaks.

 Detecting mass traces at 30 ppm ... 
 % finished: 0 10 20 30 40 50 60 70 80 90 100 
 211 m/z ROI's.

 Detecting chromatographic peaks ... 
 % finished: 0 10 20 30 40 50 60 70 80 90 100 
 196  Peaks.
> 

Actually looking at https://github.com/cran/Rmpi/blob/master/R/Rparutilities.R its intended behavior, so it has to do with all the parallel issues under R and Windows, some packages run some not, some problematic some not. I have MPI and MPICH running outside without problems. There are a number of parallel packages such as foreach, parallel, snow, PVM, RMPI, multicore, nws but which of them runs under Windows or not is not clear, for me its just the good old "R dependency hell".

 if (.Platform$OS=="windows"){
        stop("Spawning is not implemented. Please use mpiexec with Rprofile.")

Cheers Tobias