sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
177 stars 81 forks source link

do_findChromPeaks_centWave returns wrong maxo and int* values #694

Open Pascallio opened 8 months ago

Pascallio commented 8 months ago

Hi, I've noticed that do_findChromPeaks_centWave returns an incorrect maxo value and likely incorrect into and intb values. According to the documentation, maxo represents the maximum value of the peak, but in the following example it returns almost double that value (120881.1 versus 61141.14).

# Load library
library(xcms)

# Select MRM mzML from msdata package
data <- readSRMData(system.file("proteomics", "MRM-standmix-5.mzML.gz", package = "msdata"))

# Select a chromatogram
chrom <- data@.Data[[5]]

# Set the variables needed for findChromPeaks
intensities <- chrom@intensity  # Chromatogram intensities
mzs <- rep(1, length(intensities)) # Fixed MZ to eliminate ppm issues
rts <- chrom@rtime * 60 # Transform minutes to seconds
valsPerSpect <- table(rts) # 1 data point per rtime

# Run findChromPeaks with default values, should not matter for this example
xc <- as.data.frame(do_findChromPeaks_centWave(
  mz = mzs,
  int = intensities,
  scantime = rts,
  valsPerSpect = valsPerSpect
))

assertthat::are_equal(xc$maxo, max(intensities)) # FALSE
xc$maxo # 120888.1
max(intensities) # 61141.14

The chromatogram in question: image

I've tried to narrow the issue down. The maxo variable is defined here: https://github.com/sneumann/xcms/blob/db80f2cb442b7fc6d3db41b105c64d0536ab8206/R/do_findChromPeaks-functions.R#L568

and into and intb are defined here: https://github.com/sneumann/xcms/blob/db80f2cb442b7fc6d3db41b105c64d0536ab8206/R/do_findChromPeaks-functions.R#L573-L575

Since both are depended on the pd variable (which is a subset of the d variable), this issue probably affects both the maxo and int* values. The d variable represents the intensities of the found EIC:

https://github.com/sneumann/xcms/blob/db80f2cb442b7fc6d3db41b105c64d0536ab8206/R/do_findChromPeaks-functions.R#L364-L368

I've confirmed that the issue occurs in the getEIC function, but I am unable to narrow the issue further due to my lack of knowledge on C.

I've tried version 3.22 from Bioconductor and version 3.99.5 from GitHub and the issue exists in both versions.

Thanks, Pascal

Pascallio commented 8 months ago

I dove a little bit deeper into the code and it's likely an indexing issue. I've extracted the values that go into getEIC in order to extract the result from the function. Using the variables from the previous code gives us the following plot (black = original chromatogram, red = EIC chromatogram):

scanindex <- 1:length(rts) - 1
mzrange <- c(1, 1)
sr <- c(1, length(rts))

eic <- .Call("getEIC", mzs, intensities, as.integer(scanindex), as.double(mzrange),
             as.integer(sr), as.integer(length(scanindex)),
             PACKAGE = "xcms"
)

plot(intensities, type = "l", ylim = c(0, 2e5))
lines(eic$intensity, col = "red")

image

It seems that the getEIC function adds the previous scan intensity to the current scan intensity. A one-liner using dplyr confirms this:

points(intensities + dplyr::lag(intensities, n = 1), col = "green")

image

I haven't found the solution yet, but it's either in the loop in getScanEIC, or in the lowerBound or upperBound functions:

https://github.com/sneumann/xcms/blob/db80f2cb442b7fc6d3db41b105c64d0536ab8206/src/mzROI.c#L440-L447

sneumann commented 8 months ago

Hi, thanks for the report, we should turn this into a unit test once we've figured exactly what's going on. Yours, Steffeb

Pascallio commented 8 months ago

Thanks Steffen,

Interestingly, this issue does not occur in the S4 method findChromPeaks. It specifically occurs in the do_* version of the method.


# Load library
library(xcms)

# Select MRM mzML from msdata package
data <- readSRMData(system.file("proteomics", "MRM-standmix-5.mzML.gz", package = "msdata"))

# Use 0.1 - 1 due to RT being in minutes
data <- findChromPeaks(data[5, ], CentWaveParam(peakwidth = c(0.1, 1)))
chromPeaks(data)

Returns the top peak with the correct maxo of 61141.1367