Closed yufree closed 7 years ago
Hi Miao YU,
thanks for the detailed reporting. I only get warnings, and couldn't check the resulting xset in detail. I also started a discussion in #174 if there is a future way to avoid the lockmass correction in first place. Your input there would be highly appreciated.
##This is the one with correction
xset <-
xcms::xcmsSet(
cdffiles,
BPPARAM = BiocParallel::SnowParam(workers = 12),
method = "centWave",
ppm = 15,
peakwidth = c(5, 20),
prefilter = c(0, 0),
lockMassFreq = T
)
[...]
In matrix(lockMass, ncol = 2, byrow = TRUE) :In matrix(lockMass, ncol = 2, byrow = TRUE) :In matrix(lockMass, ncol = 2, byrow = TRUE) :
data length [11] is not a sub-multiple or multiple of the number of rows [6]
data length [11] is not a sub-multiple or multiple of the number of rows [6]
data length [11] is not a sub-multiple or multiple of the number of rows [6]
> xset xset
An "xcmsSet" object with 3 samples
Time range: 3.6-1079.7 seconds (0.1-18 minutes)
Mass range: 104.1082-999.8066 m/z
Peaks: 3481 (about 1160 per sample)
Peak Groups: 0
Sample classes: figshare
Peak picking was performed on MS1.
Profile settings: method = bin
step = 0.1
Memory usage: 0.53 MB
Yours, Steffen
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.04
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] xcms_1.50.1 Biobase_2.34.0 ProtGenerics_1.6.0
[4] BiocGenerics_0.20.0 mzR_2.8.1 Rcpp_0.12.10
loaded via a namespace (and not attached):
[1] RANN_2.5 lattice_0.20-34 codetools_0.2-15
[4] snow_0.4-2 MASS_7.3-45 MassSpecWavelet_1.40.0
[7] grid_3.3.2 plyr_1.8.4 stats4_3.3.2
[10] S4Vectors_0.12.2 Matrix_1.2-8 splines_3.3.2
[13] BiocParallel_1.8.2 RColorBrewer_1.1-2 tools_3.3.2
[16] compiler_3.3.2 survival_2.40-1 multtest_2.30.0 ````
Hi Steffen,
Thanks for your test!
Today I also test it on Ubuntu 16.10 and same error return. Also from my Mac. Based on this, I wonder something happened related to operation system compiler for BiocParallel.
Besides, the error only returned when I want to fill peaks using fillPeaks
instead of xsetSet
. This is the most wired thing, they should use the similar profile methods.
I would try another method to find the source of this error.
Update: Here is the error:
Error: BiocParallel errors
element index: 1, 2, 3
first error: 'fromIdx' has to be smaller than 'toIdx'!
6.
stop(.error_bplist(res))
5.
bplapply(argList, fillPeaksChromPar, BPPARAM = BPPARAM)
4.
bplapply(argList, fillPeaksChromPar, BPPARAM = BPPARAM) at methods-xcmsSet.R#1181
3.
.local(object, ...)
2.
fillPeaks.chrom(xset2) at AllGenerics.R#105
1.
fillPeaks.chrom(xset2)
I check the source code and I found the following code for fillPeaks
(MPI.R) related to this issue:
lcraw <- xcmsRaw(arg$file, profmethod=params$prof$method, profstep = 0)
if (length(params$dataCorrection) > 1) {
## Note: dataCorrection (as set in the xcmsSet function) is either
## 1 for all or for none.
if (any(params$dataCorrection) == 1)
lcraw <- stitch(lcraw, AutoLockMass(lcraw))
}
I also saw stitch
function in findpeak
which looks similar to the code above:
xRaw <- xcmsRaw(arg$file, profmethod=params$profmethod,
profparam=params$profparam, profstep = 0,
includeMSn=params$includeMSn, mslevel=params$mslevel,
scanrange=params$scanrange)
if(params$lockMassFreq == TRUE){
xRaw <- stitch(xRaw, AutoLockMass(xRaw))
}
The issue is the latter could work fine while the former from fillpeaks
would return an error.
Thanks, Miao
Thanks @yufree for looking into the code. What I find a little strange now is that both methods use a different parameter (lockMassFreq
vs dataCorrection
). I'll investigate.
The code was made to work with the snow framework and probably wasn't updated to the bioconductor parallel version. Datacorrection is essentially a binary and lockmassfreq is a found parameter as asking the user to find the lock mass frequency themselves is unrealistic. The whole trouble with this situation is you don't have access to the original waters raw file therefore it's really hard to tell where the lock mass scans are in a robust fashion across labs. I havnt looked at the method in R XCMS in a while so datacorrection I believe is a matrix of binary for the each file but I could be mistaken. Late here - maybe tomorrow.
Sent from my mobile
On May 18, 2017, at 10:38 PM, Johannes Rainer notifications@github.com wrote:
Thanks @yufree for looking into the code. What I find a little strange now is that both methods use a different parameter (lockMassFreq vs dataCorrection). I'll investigate.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Checked the code. The lockMassFreq
and the dataCorrection
are fine. The error is thrown by the new binning function binYonX
after data stitching. While centWave
does not require the profile matrix generation (i.e. the binning), the fillPeaks
does, since peaks are filled-in from the profile matrix (and not the actual raw data).
I suspect it has something to do with the @scanindex
vector.
Now, a shorter test that shows the error is:
> library(xcms)
> cdffiles <- dir(pattern = "mzXML")
> lr <- xcmsRaw(cdffiles[1])
Create profile matrix with method 'bin' and step 1 ... OK
> lr2 <- stitch(lr, AutoLockMass(lr))
Warning message:
In matrix(lockMass, ncol = 2, byrow = TRUE) :
data length [11] is not a sub-multiple or multiple of the number of rows [6]
> pm <- profMat(lr2, step = 1)
Create profile matrix with method 'bin' and step 1 ... Error in binYonX(mz, int, breaks = brks, fromIdx = fromIdx, toIdx = toIdx, :
'fromIdx' has to be smaller than 'toIdx'!
OK, fixed it:
> library(xcms)
> cdffiles <- dir(pattern = "mzXML")
> lr <- xcmsRaw(cdffiles[1])
Create profile matrix with method 'bin' and step 1 ... OK
> lr2 <- stitch(lr, AutoLockMass(lr))
> pm <- profMat(lr2, step = 1)
Create profile matrix with method 'bin' and step 1 ... OK
The stitch.xml
method was looping from scan 1:length() -1 and thus missed to add the mz and intensity values from the last spectrum to the new, stitched, object (while still adding the information to the @scanindex
vector). This caused this error.
See notes in the code of commit https://github.com/sneumann/xcms/commit/56c43ddfd7eece4fe58f1f81da66714bf23e32ef#diff-69d6894f08d72ebed9550250be653318R2376 for explanations and the fix.
@yufree since it might take some time until we merge that into Bioc devel: to install the newest xcms from github (that includes the fix):
library(devtools)
install_github("sneumann/xcms", ref = "xcms3")
Now it works!
Thanks a lot! @jotsetung @hpbenton @sneumann
Hi, I encountered a wired issues about Waters data.
I convert raw data from RAW to mzxml with msconvert( now they have lockmass refiner to correct the data). The mzxml data still have the lock mass gap. So I use
lockMassFreq = T
to fill the gap and it could find more peaks using stitch method. However, if I use this method, thefillpeaks
would always return BiocParallel errors:If I set
lockMassFreq = F
, the errors disappeared.I was confused by the conflict between those two function. Here are the dataset and codes I used:
demo dataset: https://figshare.com/articles/demo_data_for_the_error/5007965
code:
Here is my rsessioninfo:
I checked the code and doubt the error might come from the bin method for
fillPeaks
. However, I am not sure about the details. Such errors might be reproduce on Waters data if usinglockMassFreq = T
.