sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
189 stars 80 forks source link

Error: stop worker failed: attempt to select less than one element in OneIndex #383

Open Duvancito opened 5 years ago

Duvancito commented 5 years ago

Best regards,

I am trying to process 4 NetCDF files, 2 QCs and 2 treatment samples with the xcms package. After install and load the xcms library, I tried to run the next code:

peakpicking<-xcmsSet(method="centWave",peakwidth=c(4,20), prefilter=c(3,5000),snthresh=10,ppm=15)

But after 1 hour of computer work, I have the next answer:

Scanning files in directory G:/Trabajos Indep/Metabolomics_Maria_UCO/Proyecto/UCO_NEG.PRO/Metabolomics/CDF Data_2 ... found 4 files Loading required package: xcms Loading required package: Biobase Loading required package: BiocGenerics Loading required package: parallel Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:parallel’: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following objects are masked from ‘package:stats’: IQR, mad, sd, var, xtabs The following objects are masked from ‘package:base’: anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which, which.max, which.min Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Loading required package: BiocParallel Loading required package: MSnbase Loading required package: mzR Loading required package: Rcpp Loading required package: S4Vectors Loading required package: stats4 Attaching package: ‘S4Vectors’ The following object is masked from ‘package:base’: expand.grid Loading required package: ProtGenerics This is MSnbase version 2.10.0 Visit https://lgatto.github.io/MSnbase/ to get started. Attaching package: ‘MSnbase’ The following object is masked from ‘package:stats’: smooth The following object is masked from ‘package:base’: trimws This is xcms version 3.6.1 Attaching package: ‘xcms’ The following object is masked from ‘package:stats’: sigma

Error: stop worker failed: attempt to select less than one element in OneIndex In addition: Warning messages: 1: In serialize(data, node$con) : 'package:stats' may not be available when loading 2: In serialize(data, node$con) : 'package:stats' may not be available when loading

So, I can´t find why the code on R doesn´t run. I am using:

Could someone help me ?

sneumann commented 5 years ago

Hi, judging from the file path you are on Windows, correct ? Parallel processing under Windows can be tricky. What happens with xset <- xcmsSet(method="centWave", peakwidth=c(4,20), prefilter=c(3,5000),snthresh=10,ppm=15, BPPARAM = SerialParam())

See also http://stanstrup.github.io/material/presentations/1.%20XCMS.html#/8

Yours, Steffen

Duvancito commented 5 years ago

Best Regards. Thank you for the soon answer.

How you said, I am using Windows. After put ###BPPARAM = SerialParam()### in the R script, the code works perfectly.

I just watched the next warning after the process: In .local(object, ...) : It looks like this file is in profile mode. centWave can process only centroid mode data ! After verified, I can confirm that data are in centroid mode, so I decided to continue.

Thank you Steffen

sneumann commented 5 years ago

Hi, the warning indicates that there were raw peaks (centroids, as you confirmed) which are too close. If you tighten the ppm=15 to, maybe, ppm=10 or even 5, that might help. Which instrument is that ? Yours, Steffen

Duvancito commented 5 years ago

Hi Steffen,

Using ppm = 10, the warning for "profile mode" still appear. I will try with lowers values.
The equipment is a Waters Acquity XEVO G2-XS QTof.

Kind regards.

shubham1637 commented 4 years ago

Hi, I see a similar error with mzR::chromatograms(mz, chromIndices). I am using linux system.

> registered()
$MulticoreParam
class: MulticoreParam
  bpisup: FALSE; bpnworkers: 6; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
  bpexportglobals: TRUE
  bplogdir: NA
  bpresultdir: NA
  cluster type: FORK

Error is like:

> list2 <- bplapply(1:100, Package:::fun())
Error: BiocParallel errors
  element index: 9, 10, 11, 12, 13, 14, ...
  first error: [SAXParser::ParserWrangler::elementEnd()] Illegal end tag "binaryDataArray" at offset 2754452.
In addition: Warning message:
stop worker failed:
  attempt to select less than one element in OneIndex

Have you found out any reasoning for this error?

jorainer commented 4 years ago

General info about attempt to select less than one element in OneIndex errors. These occur mostly if you run parallel processing and the system (or R) runs out of memory. Then one of the workers seems to silently fail and not pass any result back to the master process (hence the message about the select less than one element.

Your error @shubham1637 is an different one, the attempt to select less than one element in OneIndex is just the result of the original error first error: [SAXParser::.... That error is thrown by the proteowizard C++ code that mzR uses to read MS raw files. From that error it seems more to an error related to your input file(s) that proteowizard seems to be unable to read.

shubham1637 commented 4 years ago

@jorainer Thanks. I used lapply() it works fine. So, it may not come from reading the file.

mzs[[i]] <- mzR::openMSfile(filename[i], backend = "pwiz") # Get mzRpwiz object for 16 files.
list2 <- bplapply(1:100, Package:::fun()) # fun() extracts chromatograms for a peptide from all the files and does some operation.

However, as you have suggested that R could be out-of-memory. I will try to run the program with a lesser number of files and see if that helps. Thanks

shubham1637 commented 4 years ago

I tried with four files and ran the same code as above. If I use one core, there is never any issue. Once I use >1 cores, it sometimes goes through without any error, and many times it results in various errors or gets stuck. Errors are like these: 1) Run with two cores BiocParallel::register(BiocParallel::MulticoreParam(workers = 2, log = TRUE, threshold = "TRACE", progressbar = TRUE))

########## LOG OUTPUT ###############
Task: 2
Node: 2
Timestamp: 2020-08-13 16:32:52
Success: FALSE

Task duration:
   user  system elapsed 
 19.372   0.319  19.688 
Memory used:
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 2958176 158.0    5851986 312.6  5851986 312.6
Vcells 8700271  66.4   17096268 130.5 16343570 124.7
Log messages:
stderr and stdout:
ERROR [2020-08-13 16:32:52] bad lexical cast: source type value could not be interpreted as target
Error: BiocParallel errors
  element index: 890, 891, 892, 893, 894, 895, ...
  first error: bad lexical cast: source type value could not be interpreted as target

2) Run with six cores BiocParallel::register(BiocParallel::MulticoreParam(workers = 6, log = TRUE, threshold = "TRACE", progressbar = TRUE))

########## LOG OUTPUT ###############
Task: 6
Node: 6
Timestamp: 2020-08-13 16:45:58
Success: FALSE

Task duration:
   user  system elapsed 
  0.403   0.096   0.452 
Memory used:
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 2969397 158.6    5851987 312.6  5851987 312.6
Vcells 8490501  64.8   17096268 130.5 16654835 127.1
Log messages:
stderr and stdout:
ERROR [2020-08-13 16:45:58] [SAXParser::ParserWrangler::elementEnd()] Illegal end tag "binaryDataArray" at offset 19530630.

Error: BiocParallel errors
  element index: 37, 38, 39, 40, 41
  first error: [SAXParser::ParserWrangler::elementEnd()] Illegal end tag "binaryDataArray" at offset 19530630.

Seems like errors are originating from proteowizard as you have mentioned. I am not able to locate the source of it. Each file has approx 227k extracted-ion-chromatograms.

I am wondering if the problem is due to multiple threads reading the same file? and proteowizard probably can't support that?

jorainer commented 4 years ago

Instead of opening all filehandles at the beginning (note that there might also be a limit of open file handles either in R or the operating system) I would suggest to call the mzR::openMSfile each time you access the data and to close the connection also after that. That's also how we use mzR in MSnbase and there we don't have problems with parallel processing.

shubham1637 commented 4 years ago

In the case of a huge file (1 GB), mzR::openMSfile takes a while (approx 1-2 minute) to return pwiz object. This will slow the program quite significantly. I had tried the program on two runs only and there as well I see the same error messages: https://github.com/sneumann/mzR/issues/228#issuecomment-675507330