rietho / IPO

A Tool for automated Optimization of XCMS Parameters
http://bioconductor.org/packages/IPO/
Other
34 stars 20 forks source link

Error "checkForRemoteErrors(val) : 2 nodes produced errors" when using IPO #33

Closed jamesrco closed 7 years ago

jamesrco commented 8 years ago

Hello there, I am attempting to use the IPO package to optimize some settings for an algal lipid dataset I'm working with. The .mzXML data files I use in the code below (the most "well-behaved" 4 of 18 total in the dataset) are online at https://github.com/vanmooylipidomics/LOBSTAHS/Pt_H2O2_mzXML_ms1_pos/0_uM_H2O2/ When I run the following code:

# optim_xcmsParams.R
#
# Created 11/28/2015 by J.R.C.
#
# Purpose: Optimize parameters for several commands in the xcms package using the R package "IPO." Currently, the script is written to optimize parameters for peak-picking, alignment, etc., of lipid data from the experiment described in Graff van Creveld et al., 2015, "Early perturbation in mitochondria redox homeostasis in response to environmental stress predicts cell fate in diatoms," ISME Journal 9:385-395. This dataset is used to demonstrate the LOBSTAHS lipidomics pipeline in Collins, J.R., B.R. Edwards, H.F. Fredricks, and B.A.S. Van Mooy, 2015, "Untargeted discovery and identification of oxidative stress biomarkers using a lipidomics pipeline for complex datasets."
#
# IPO is described in Libiseller et al., 2015, "IPO: a tool for automated optimization of XCMS parameters," BMC Bioinformatics 16:118; see https://github.com/glibiseller/IPO/blob/master/vignettes/IPO.Rmd for installation instructions
#
# See https://github.com/vanmooylipidomics/LOBSTAHS for current versions of all pipeline scripts

################ Initial setup and variable definition #############

# load required packages

library(tools) 

library(xcms)

library(CAMERA)

library(rsm)

# run two lines below only if IPO hasn't been installed already

# library(devtools)
# install_github("glibiseller/IPO") 

library(IPO)

library(snowfall) # if multicore tasking is desired

################# User: define locations of data files and database(s) #############

working_dir = "/Users/jrcollins/Dropbox/code/LOBSTAHS/" # specify working directory
setwd(working_dir) # set working directory to working_dir

# specify directories subordinate to the working directory in which the .mzXML files for xcms can be found; per xcms documentation, use subdirectories within these to divide files according to treatment/primary environmental variable (e.g., station number along a cruise transect) and file names to indicate timepoint/secondary environmental variable (e.g., depth)
mzXMLfiles_folder_pos = "Pt_H2O2_mzXML_ms1_pos/" 
mzXMLfiles_folder_neg = "Pt_H2O2_mzXML_ms1_neg/" 

################# Load in mzXML files #############

mzXMLfiles = list.files(mzXMLfiles_folder_pos, recursive = TRUE, full.names = TRUE)

# # exclude any files you don't want to push through xcms (e.g., blanks); note that the blanks for the Pt H2O2 dataset (Orbi_0481.mzXML and Orbi_0482.mzXML) have already been removed
# mzXMLfiles = mzXMLfiles[-c(1,2)]

################# Use IPO to optimize some of xcms parameters #############

# will use IPO to optimize settings for method = centWave

# define ranges of parameters to be tested
# if single value is specified for a parameter, or centWave default is used, that parameter will not be optimized

peakpickingParameters <- getDefaultXcmsSetStartingParams('centWave')
peakpickingParameters$min_peakwidth <- c(10,20) # centerpoint is 15
peakpickingParameters$max_peakwidth <- c(40,80) # centerpoint is 60
peakpickingParameters$ppm <- c(1.5,3.5)
peakpickingParameters$prefilter <- c(3,5)
peakpickingParameters$value_of_prefilter <- c(5000,10000)
peakpickingParameters$snthresh <- c(3,10)
peakpickingParameters$noise <- c(5000)

# only going to use the first 6 files from the dataset (0 uM H2O2 treatment) for optimization routine

resultPeakpicking <- optimizeXcmsSet(files= mzXMLfiles[1:4], 
                                     params=peakpickingParameters, nSlaves=4, subdir='rsmDirectory')
optimizedXcmsSetObject <- resultPeakpicking$best_settings$xset'

Things start working just fine:

starting new DoE with:
min_peakwidth: c(10, 20)
max_peakwidth: c(40, 80)
ppm: c(1.5, 3.5)
mzdiff: c(-0.001, 0.01)
snthresh: c(3, 10)
noise: 5000
prefilter: c(3, 5)
value_of_prefilter: c(5000, 10000)
mzCenterFun: wMean
integrate: 1
fitgauss: FALSE
verbose.columns: FALSE
nSlaves: 1

It runs for a few hours. But then, I receive this error:

Error in checkForRemoteErrors(val) : 
  2 nodes produced errors; first error: m/z sort assumption violated ! (scan 121, p 1756, current 4.0416 (I=0.00), last 282.9776) 
> optimizedXcmsSetObject <- resultPeakpicking$best_settings$xset
Error: object 'resultPeakpicking' not found
> resultPeakpicking
Error: object 'resultPeakpicking' not found

Any ideas what's happening? I am very excited about the idea of optimizing some of these parameters via objective functions, but I can't figure out what the problem is!

Thanks very much, in advance.

Jamie Collins Woods Hole Oceanographic Institution

jamesrco commented 8 years ago

Some further clarification: If I use a different combination of sample files to run the optimization routine, there are fewer (or no) m/z sort assumption errors. But, I cannot figure out why two files -- those indexed by mzXMLfiles[c(2,5)] -- appear to throw up the error, while the others don't. I can't figure out what is wrong with the particular scan #'s in those files (scan 121 in file 2, and scan 107 in file 5) that is causing the code to choke.

sneumann commented 8 years ago

Can you put links to just the two offending files here ? Thanks, Steffen

jamesrco commented 8 years ago

Hi Steffen, Sure, no problem. mzXMLfiles[2]: https://github.com/vanmooylipidomics/LOBSTAHS/blob/master/Pt_H2O2_mzXML_ms1_pos/0_uM_H2O2/0uM_24h_Orbi_0473.mzXML mzXMLfiles[5]: https://github.com/vanmooylipidomics/LOBSTAHS/blob/master/Pt_H2O2_mzXML_ms1_pos/0_uM_H2O2/0uM_8h_Orbi_0472.mzXML Curious to see if you can figure out why the code is choking on the two particular scans in these files. Or, maybe it's a weird combination of these files together? Respectfully, Jamie

sneumann commented 8 years ago

So the error comes from xcms, and has nothing to do with IPO. I checked by putting the two files in a directory and runnign xs <- xcmsSet(method="centWave", ppm=5) which worked (Linux, xcms_1.45.7). Of course, with unoptimised settings :-) So I'd say it must be other files. But maybe this way you get the culprits faster than trying with IPO in the mix. Yours, Steffen

jamesrco commented 8 years ago

Thanks! Before I even began using IPO, I was able to create an xcmsSet with all 18 files in the dataset without any problem. So, I am confused as to why the error only seems to get thrown up when I'm running the optimization routine. Maybe there is something about those two files that conflicts with one of the parameter values that IPO is testing? Is that possible?

sneumann commented 8 years ago

I can confirm that the 18 files in pos work in a single xcmsSet. Can you try without parallelisation ?

jamesrco commented 8 years ago

I will give it try today... May take a little while without any parallelization!