sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
177 stars 81 forks source link

Waters data problems with Obiwarp #739

Closed parasitetwin closed 2 months ago

parasitetwin commented 2 months ago

Hello!

So we've been working with Waters data in our lab since half a year back and want to use xcms for processing. We have, however, run into some trouble with Obiwarp when we're trying to use the "adjustRTime()" function on the data. I think that I have been able to find the source of the problem but I'm not sure how to alleviate it on my own.

Waters ".raw" files always (at least for the two instruments we've been working with; SynaptXS and MRTs) contain different channels for MS1 fullscan data (channel 1) and for the lock mass (channel 2). When converting to ".mzML" format in MSConvert we remove the lock mass channel by using the subset option and only keeping the first channel (MS1 fullscan data). Using xcms 3 the following error was received for every file:

Aligning 2024-02-03_B1W49_RP_NEG_sQC07_172.mzML against 2024-02-03_B1W49_RP_NEG_sQC03_128.mzML ... Found gaps in scan times of the center sample: cut scantime-vector at 29.95886743068 seconds.
Found gaps in scan time of file 2024-02-03_B1W49_RP_NEG_sQC07_172.mzML: cut scantime-vector at 29.9558508396 seconds.
OK

The lockmass signal is measured every 30 seconds which is what lead me to believe that the gap created by the measurement on the second channel is what is causing the problem.

In xcms 4 the problem seems to persist. I base this on the plot received when running this code:

group_colors <- paste0(brewer.pal(3, "Set1")[1:2], "60")
sample_colors <- group_colors[as.factor(xcmsData@sampleData$sample_group)]
par(mfrow = c(3, 1), mar = c(4.5, 4.2, 1, 0.5))
plot(bpis, col = sample_colors)
grid()
plot(bpis_adj, col = sample_colors)
grid()
## Plot also the difference of adjusted to raw retention time.
plotAdjustedRtime(xcmsData_rtAdj, col = sample_colors)
grid()

image As can be seen in the picture the scantime vector does seem to be cut at ~30 seconds since all the warping is done previous to that point.

In xcms 4 running "adjustRTime()" didn't generate any warnings (I realize this can be due to my settings and setup hehe) but based on the warning from xcms 3 I dug into the code to find the source of the problem and found this section:

## 1)Check the scan times of both objects:
        scantime1 <- unname(rtime(cntr))
        scantime2 <- unname(rtime(z))
        scantime1_diff <- diff(scantime1)
        scantime2_diff <- diff(scantime2)
        ## median difference between spectras' scan time.
        mstdiff <- median(c(scantime1_diff, scantime2_diff), na.rm = TRUE)

        ## rtup1 <- seq_along(scantime1)
        ## rtup2 <- seq_along(scantime2)

        mst1 <- which(scantime1_diff > 5 * mstdiff)[1]
        if (!is.na(mst1)) {
            message("Found gaps in scan times of the center sample: cut ",
                    "scantime-vector at ", scantime1[mst1]," seconds.")
            scantime1 <- scantime1[seq_len(max(2, (mst1 - 1)))]
        }
        mst2 <- which(scantime2_diff > 5 * mstdiff)[1]
        if (!is.na(mst2)) {
            message("Found gaps in scan time of file ", basename(fileNames(z)),
                    ": cut scantime-vector at ", scantime2[mst2]," seconds.")
            scantime2 <- scantime2[seq_len(max(2, (mst2 - 1)))]
        }

My guess is that the gap in the scantime-vector caused by the missing scan (which is due to the scan being made on the separate lockmass channel) is causing the if-statement for mst1 and mst2 to trigger and thus cutting the scantime-vector short.

We have tried not removing the lockmass channel prior to conversion to ".mzML" but that hasn't helped.

So I guess what I'm asking is if it would be possible to take into account whether the original file came from a Waters instrument and in that case just skipping the scantime vector step? Or perhaps to add a manual setting through an extra parameter which circumvents the cutting of the scantime vector?

Thanks for all your hard work on this excellent library!

Cheers, Anton Ribbenstedt

jorainer commented 2 months ago

very brave to actually look into the relevant xcms R-code. If possible, I would however suggest to get rid of the lock mass scans somehow - which should (?) solve this problem - I guess...

sneumann commented 2 months ago

Hi, I think @parasitetwin tried exactly that: "we remove the lock mass channel by using the subset option". I don't think we'll special-case Waters data as such, but I'd like to understand what's going wrong with the subset' data. Can we haz' an example, plus copy&pasteable code what works and what doesn't ? It seems Obiwarp is making assumptions about average / maximum distance between scans. Maybe we can make the 5 in scantime2_diff > 5 * mstdiff configurable ? If you hack that into some large number, what happens to the alignment around the lockmass scans ? Yours, Steffen

parasitetwin commented 2 months ago

Hello Steffen!

Here's a link to the data I've been using: https://chalmers-my.sharepoint.com/:f:/g/personal/antonri_chalmers_se/ErDJ_ByAqY1EqxgQXiPPYboB93UgGgBERQ8uPR82XIhU9Q?e=o5WAKx

And here's the code I've been running:

library(xcms)
library(MsExperiment)
library(stats)
library(IPO2)
library(nloptr)
library(RColorBrewer)
library(MsCoreUtils)
library(Spectra)
library(MsBackendMgf)

inputFolder <- "C:/Your/Folder/"
pd <- data.frame(sample_name = list.files(inputFolder),
                 sample_group = c(rep("sQC", 10)),
                 stringsAsFactors = FALSE)

xcmsData <- readMsExperiment(spectraFiles = list.files(inputFolder, full.names = T), sampleData = pd)
xcmsData <- filterSpectra(xcmsData, filterEmptySpectra)

cwp <- CentWaveParam(peakwidth = c(2, 59.84576), noise = 400,
                     prefilter = c(3, 1500),
                     ppm = 4.405003,
                     mzdiff = -0.008364678)
xcmsData <- findChromPeaks(xcmsData, param = cwp)

bpis <- chromatogram(xcmsData, aggregationFun = "max")

xcmsData_rtAdj <- adjustRtime(xcmsData, param = ObiwarpParam(binSize = 0.6))

bpis_adj <- chromatogram(xcmsData, aggregationFun = "max", chromPeaks = "none")
group_colors <- paste0(brewer.pal(3, "Set1")[1:2], "60")
sample_colors <- group_colors[as.factor(xcmsData@sampleData$sample_group)]
par(mfrow = c(3, 1), mar = c(4.5, 4.2, 1, 0.5))
plot(bpis, col = sample_colors)
grid()
plot(bpis_adj, col = sample_colors)
grid()
plotAdjustedRtime(xcmsData_rtAdj, col = sample_colors)
grid()

@sneumann Yes! From my understanding of the problem I think that would probably solve it. Haven't played around with any changes on my own so far so haven't tried upping it to a large number, but sounds like a likely solution :)

Cheers, Anton

jorainer commented 2 months ago

hm, yes, the first MS2 scan (I guess that's the lockmass) is at 30 seconds. If obiwarp starts crying each time when there is a gap in the sequence of retention time because a (or more) MS2 spectra were recorded we will run into the same problem with any LC-MS/MS data. I never use obiwarp, so have never observed this problem so far.

So @sneumann 's suggestion to allow changing the hard-coded value 5 seems the most reasonable solution. No idea what the reasoning of this value was in the first place...

jorainer commented 2 months ago

I've added parameter rtimeDifferenceThreshold to ObiwarpParam. It defaults to 5, but can be increased to avoid the above situations. @parasitetwin , can you please confirm that it works? I would then create the PR.

parasitetwin commented 2 months ago

Hello again!

This is the result from utilizing the new "rtimeDifferenceThreshold" Obiwarp parameter and setting it to 50: image

This worked beautifully :) exactly what we needed. Thank you both for all your help!

Cheers, Anton

jorainer commented 2 months ago

Great! I will then make a PR for this fix.

sneumann commented 2 months ago

PR #740 merged now. Thanks everyone for your help ! Yours, Steffen

parasitetwin commented 2 months ago

Thanks everyone!