Open Duvancito opened 5 years ago
Hi, judging from the file path you are on Windows, correct ?
Parallel processing under Windows can be tricky. What happens with
xset <- xcmsSet(method="centWave", peakwidth=c(4,20), prefilter=c(3,5000),snthresh=10,ppm=15, BPPARAM = SerialParam())
See also http://stanstrup.github.io/material/presentations/1.%20XCMS.html#/8
Yours, Steffen
Best Regards. Thank you for the soon answer.
How you said, I am using Windows. After put ###BPPARAM = SerialParam()### in the R script, the code works perfectly.
I just watched the next warning after the process: In .local(object, ...) : It looks like this file is in profile mode. centWave can process only centroid mode data ! After verified, I can confirm that data are in centroid mode, so I decided to continue.
Thank you Steffen
Hi, the warning indicates that there were raw peaks (centroids, as you confirmed)
which are too close. If you tighten the ppm=15
to, maybe, ppm=10
or even 5,
that might help. Which instrument is that ? Yours, Steffen
Hi Steffen,
Using ppm = 10, the warning for "profile mode" still appear. I will try with lowers values.
The equipment is a Waters Acquity XEVO G2-XS QTof.
Kind regards.
Hi, I see a similar error with mzR::chromatograms(mz, chromIndices)
. I am using linux system.
> registered()
$MulticoreParam
class: MulticoreParam
bpisup: FALSE; bpnworkers: 6; bptasks: 0; bpjobname: BPJOB
bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
bpexportglobals: TRUE
bplogdir: NA
bpresultdir: NA
cluster type: FORK
Error is like:
> list2 <- bplapply(1:100, Package:::fun())
Error: BiocParallel errors
element index: 9, 10, 11, 12, 13, 14, ...
first error: [SAXParser::ParserWrangler::elementEnd()] Illegal end tag "binaryDataArray" at offset 2754452.
In addition: Warning message:
stop worker failed:
attempt to select less than one element in OneIndex
Have you found out any reasoning for this error?
General info about attempt to select less than one element in OneIndex
errors. These occur mostly if you run parallel processing and the system (or R) runs out of memory. Then one of the workers seems to silently fail and not pass any result back to the master process (hence the message about the select less than one element
.
Your error @shubham1637 is an different one, the attempt to select less than one element in OneIndex
is just the result of the original error first error: [SAXParser::...
. That error is thrown by the proteowizard C++ code that mzR
uses to read MS raw files. From that error it seems more to an error related to your input file(s) that proteowizard seems to be unable to read.
@jorainer Thanks. I used lapply()
it works fine. So, it may not come from reading the file.
mzs[[i]] <- mzR::openMSfile(filename[i], backend = "pwiz") # Get mzRpwiz object for 16 files.
list2 <- bplapply(1:100, Package:::fun()) # fun() extracts chromatograms for a peptide from all the files and does some operation.
However, as you have suggested that R could be out-of-memory. I will try to run the program with a lesser number of files and see if that helps. Thanks
I tried with four files and ran the same code as above. If I use one core, there is never any issue. Once I use >1 cores, it sometimes goes through without any error, and many times it results in various errors or gets stuck.
Errors are like these:
1) Run with two cores BiocParallel::register(BiocParallel::MulticoreParam(workers = 2, log = TRUE, threshold = "TRACE", progressbar = TRUE))
########## LOG OUTPUT ###############
Task: 2
Node: 2
Timestamp: 2020-08-13 16:32:52
Success: FALSE
Task duration:
user system elapsed
19.372 0.319 19.688
Memory used:
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 2958176 158.0 5851986 312.6 5851986 312.6
Vcells 8700271 66.4 17096268 130.5 16343570 124.7
Log messages:
stderr and stdout:
ERROR [2020-08-13 16:32:52] bad lexical cast: source type value could not be interpreted as target
Error: BiocParallel errors
element index: 890, 891, 892, 893, 894, 895, ...
first error: bad lexical cast: source type value could not be interpreted as target
2) Run with six cores BiocParallel::register(BiocParallel::MulticoreParam(workers = 6, log = TRUE, threshold = "TRACE", progressbar = TRUE))
########## LOG OUTPUT ###############
Task: 6
Node: 6
Timestamp: 2020-08-13 16:45:58
Success: FALSE
Task duration:
user system elapsed
0.403 0.096 0.452
Memory used:
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 2969397 158.6 5851987 312.6 5851987 312.6
Vcells 8490501 64.8 17096268 130.5 16654835 127.1
Log messages:
stderr and stdout:
ERROR [2020-08-13 16:45:58] [SAXParser::ParserWrangler::elementEnd()] Illegal end tag "binaryDataArray" at offset 19530630.
Error: BiocParallel errors
element index: 37, 38, 39, 40, 41
first error: [SAXParser::ParserWrangler::elementEnd()] Illegal end tag "binaryDataArray" at offset 19530630.
Seems like errors are originating from proteowizard as you have mentioned. I am not able to locate the source of it. Each file has approx 227k extracted-ion-chromatograms.
I am wondering if the problem is due to multiple threads reading the same file? and proteowizard probably can't support that?
Instead of opening all filehandles at the beginning (note that there might also be a limit of open file handles either in R or the operating system) I would suggest to call the mzR::openMSfile
each time you access the data and to close the connection also after that. That's also how we use mzR
in MSnbase
and there we don't have problems with parallel processing.
In the case of a huge file (1 GB), mzR::openMSfile
takes a while (approx 1-2 minute) to return pwiz object. This will slow the program quite significantly. I had tried the program on two runs only and there as well I see the same error messages:
https://github.com/sneumann/mzR/issues/228#issuecomment-675507330
Best regards,
I am trying to process 4 NetCDF files, 2 QCs and 2 treatment samples with the xcms package. After install and load the xcms library, I tried to run the next code:
peakpicking<-xcmsSet(method="centWave",peakwidth=c(4,20), prefilter=c(3,5000),snthresh=10,ppm=15)
But after 1 hour of computer work, I have the next answer:
Scanning files in directory G:/Trabajos Indep/Metabolomics_Maria_UCO/Proyecto/UCO_NEG.PRO/Metabolomics/CDF Data_2 ... found 4 files Loading required package: xcms Loading required package: Biobase Loading required package: BiocGenerics Loading required package: parallel Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:parallel’: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following objects are masked from ‘package:stats’: IQR, mad, sd, var, xtabs The following objects are masked from ‘package:base’: anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which, which.max, which.min Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Loading required package: BiocParallel Loading required package: MSnbase Loading required package: mzR Loading required package: Rcpp Loading required package: S4Vectors Loading required package: stats4 Attaching package: ‘S4Vectors’ The following object is masked from ‘package:base’: expand.grid Loading required package: ProtGenerics This is MSnbase version 2.10.0 Visit https://lgatto.github.io/MSnbase/ to get started. Attaching package: ‘MSnbase’ The following object is masked from ‘package:stats’: smooth The following object is masked from ‘package:base’: trimws This is xcms version 3.6.1 Attaching package: ‘xcms’ The following object is masked from ‘package:stats’: sigma
Error: stop worker failed: attempt to select less than one element in OneIndex In addition: Warning messages: 1: In serialize(data, node$con) : 'package:stats' may not be available when loading 2: In serialize(data, node$con) : 'package:stats' may not be available when loading
So, I can´t find why the code on R doesn´t run. I am using:
Could someone help me ?