Open Pascallio opened 8 months ago
I dove a little bit deeper into the code and it's likely an indexing issue. I've extracted the values that go into getEIC
in order to extract the result from the function. Using the variables from the previous code gives us the following plot (black = original chromatogram, red = EIC chromatogram):
scanindex <- 1:length(rts) - 1
mzrange <- c(1, 1)
sr <- c(1, length(rts))
eic <- .Call("getEIC", mzs, intensities, as.integer(scanindex), as.double(mzrange),
as.integer(sr), as.integer(length(scanindex)),
PACKAGE = "xcms"
)
plot(intensities, type = "l", ylim = c(0, 2e5))
lines(eic$intensity, col = "red")
It seems that the getEIC
function adds the previous scan intensity to the current scan intensity. A one-liner using dplyr confirms this:
points(intensities + dplyr::lag(intensities, n = 1), col = "green")
I haven't found the solution yet, but it's either in the loop in getScanEIC
, or in the lowerBound
or upperBound
functions:
https://github.com/sneumann/xcms/blob/db80f2cb442b7fc6d3db41b105c64d0536ab8206/src/mzROI.c#L440-L447
Hi, thanks for the report, we should turn this into a unit test once we've figured exactly what's going on. Yours, Steffeb
Thanks Steffen,
Interestingly, this issue does not occur in the S4 method findChromPeaks
. It specifically occurs in the do_*
version of the method.
# Load library
library(xcms)
# Select MRM mzML from msdata package
data <- readSRMData(system.file("proteomics", "MRM-standmix-5.mzML.gz", package = "msdata"))
# Use 0.1 - 1 due to RT being in minutes
data <- findChromPeaks(data[5, ], CentWaveParam(peakwidth = c(0.1, 1)))
chromPeaks(data)
Returns the top peak with the correct maxo of 61141.1367
Hi, I've noticed that
do_findChromPeaks_centWave
returns an incorrectmaxo
value and likely incorrectinto
andintb
values. According to the documentation,maxo
represents the maximum value of the peak, but in the following example it returns almost double that value (120881.1 versus 61141.14).The chromatogram in question:
I've tried to narrow the issue down. The maxo variable is defined here: https://github.com/sneumann/xcms/blob/db80f2cb442b7fc6d3db41b105c64d0536ab8206/R/do_findChromPeaks-functions.R#L568
and into and intb are defined here: https://github.com/sneumann/xcms/blob/db80f2cb442b7fc6d3db41b105c64d0536ab8206/R/do_findChromPeaks-functions.R#L573-L575
Since both are depended on the
pd
variable (which is a subset of thed
variable), this issue probably affects both themaxo
andint*
values. Thed
variable represents the intensities of the found EIC:https://github.com/sneumann/xcms/blob/db80f2cb442b7fc6d3db41b105c64d0536ab8206/R/do_findChromPeaks-functions.R#L364-L368
I've confirmed that the issue occurs in the
getEIC
function, but I am unable to narrow the issue further due to my lack of knowledge on C.I've tried version 3.22 from Bioconductor and version 3.99.5 from GitHub and the issue exists in both versions.
Thanks, Pascal