xcmsSet failing at finding peaks (in polarity switching runs ?)

videlc commented 3 years ago

Hi,

I'm trying to enhance peak detection performances to an existing package by integrating super-fast chromatographic peak detection capabilities of xcms. During control testing performed on data published with that package, I've been surprised to notice that xcms failed at detecting some (I did not test them all...) of the highest peaks present in the polarity switching mzML file.

Example with attempts to detect [LGD4033+HCOO]- or [LGD4033+H]:

xset<-xcmsSet("LGD_DIA_peak-picked.mzML",method="centWave",ppm=10,peakwidth=c(2,30),snthresh=0,integrate=1,noise=0,
              mzCenterFun="wMean")

xset_df<-data.frame(xset@peaks)

xset_df[which(xset_df$mz > 383.0835-10/1e6*383.0835 & xset_df$mz < 383.0835+10/1e6*383.0835),]
xset_df[which(xset_df$mz > 339.09266-10/1e6*339.09266 & xset_df$mz < 339.09266+10/1e6*339.09266),]

returns :

However, peaks are detectable using MSnbase/chromatogram functions :

mzfile<-"LGD_DIA_peak-picked.mzML"
mzdata<-readMSData(mzfile,mode="onDisk")
chr<-chromatogram(
  mzdata,
  mz=c(383.0835-383.0835*10/1e6,383.0835+383.0835*10/1e6)
)

chr<-data.frame(rt=chr[[1]]@rtime,int=chr[[1]]@intensity)

chr$scan<-ifelse(is.na(chr$int),"pos","neg")

require(ggplot2)
ggplot(data=chr,aes(x=rt,y=int))+
  geom_path()+geom_point()+
  coord_cartesian(xlim = c(750,850))+
  facet_grid(scan~.)

mzfile<-"LGD_DIA_peak-picked.mzML"
mzdata<-readMSData(mzfile,mode="onDisk")
chr<-chromatogram(
  mzdata,
  mz=c(339.09266-339.09266*10/1e6,339.09266+339.09266*10/1e6)
)

chr<-data.frame(rt=chr[[1]]@rtime,int=chr[[1]]@intensity)

chr$scan<-ifelse(is.na(chr$int),"neg","pos")

require(ggplot2)
ggplot(data=chr,aes(x=rt,y=int))+
  geom_path()+geom_point()+
  coord_cartesian(xlim = c(750,850))+
  facet_grid(scan~.)

Produces :

and

Single polarity working example with the detection of cocaine M+H:

xset<-xcmsSet("U_H_COCA_peak-picked.mzML",method="centWave",ppm=10,peakwidth=c(2,30),snthresh=0,integrate=1,noise=0,
              mzCenterFun="wMean")
xset_df<-data.frame(xset@peaks)

xset_df[which(xset_df$mz > 304.1543-10/1e6*304.1543 & xset_df$mz < 304.1543+10/1e6*304.1543),]

Produces :

The major difference that comes to my mind is that the LGD dataset is acquired in fullMS-DIA polarity switching mode whereas cocaine dataset is acquired in fullMS-DIA positive mode only.

Any (other) idea ?

With kindest regards, Vivian

jorainer commented 3 years ago

Firstly, I would appreciate if you could use the newer functions instead of the xcmsSet call - please have a look at the vignette. The new functions build directly on the MSnbase package and its objects.

Now, from the chromatograms you extract it looks like you have very narrow peaks (with 3-4 data points only and the peak width being ~ 2 seconds). What I would do is to consider changing the peakwidth parameter, to e.g. c(1.5, 5):

mzfile <- "LGD_DIA_peak-picked.mzML"
mzdata <- readMSData(mzfile, mode = "onDisk")
mzdata <- findChromPeaks(mzdata, CentWaveParam(peakwidth = c(1.5, 5)))

also, maybe rethink the ppm parameter: this is not the precision of your instrument. It is the maximal acceptable difference of m/z values in consecutive scans - usually it doesn't hurt to increase that value - I use 40ppm for our data, with an instrument that's considered to have a 5ppm error (more info about that parameter provided in the vignette).

stanstrup commented 3 years ago

Shooting from the hip here:

Centwave uses ROI detection whereas I assume detecting in the chromatogram does not. As far as I know there is a hard limit of a minimum of 4 scans above baseline in the ROI. "If the largest number of sequential points above the baseline is found to be greater than or equal to either four (hard coded), or the minimum peak width minus two, whichever is larger, the ROI continues to the next step, otherwise it is discarded." https://pubs.acs.org/doi/suppl/10.1021/acs.analchem.7b01069/suppl_file/ac7b01069_si_001.pdf you might be hitting that.
AFAIK the old XCMS at least won't be able to recognize that there are two different traces/modes. So it is trying to detect peaks in data where all the scans are tangled together. I assume you'd also need to filter with the new approach before peak detection? @jorainer

videlc commented 3 years ago

Hi,

Thanks for your answer. The peak is indeed sharp (~6 s at FWHM) and the low number of points is due to the MS method. Data acquisition was performed on a QE-HF (polarity switching ~~ 1s) with :

FullMS (pos) then 9 windows DIA (pos)
FullMS (neg) then 9 widows DIA (neg)

This results in a duty cycle between two FullMS (pos) scans of ~~ 2 secs Running the newer command did not perform better :


mzfile <- "LGD_DIA_peak-picked.mzML"
mzdata <- readMSData(mzfile, mode = "onDisk")  
mzdata <- findChromPeaks(mzdata, CentWaveParam(peakwidth = c(1.5, 5),ppm = 50,snthresh = 0,noise = 0))

xset_df<-data.frame(mzdata@msFeatureData$chromPeaks)

xset_df[which(xset_df$mz > 383.0835-10/1e6*383.0835 & xset_df$mz < 383.0835+10/1e6*383.0835),]
xset_df[which(xset_df$mz > 339.09266-10/1e6*339.09266 & xset_df$mz < 339.09266+10/1e6*339.09266),]

yields :

However, by taking a look at the results findChrompeaks :

> xset_df[which.max(xset_df$sn),]
             mz    mzmin   mzmax       rt    rtmin    rtmax    into    intb    maxo      sn sample
CP0550 338.0823 338.0814 338.085 802.4849 800.3239 806.7729 5242005 5241997 2085130 2085129      1

338.0823 is indeed detected in the pos and neg modes (fun fact, it's the 13c isotope of [LGD4033-H]- in neg mode...). Data looks like this (note that this is RAW data not msconverted/peak-picked). :

m/z values xcms is missing :

The point of centwave ROI detection is valid since i've got few point per peaks. Orbitrap is a slower scanner than a QTOF (but can switch polarities !). Is there any workaround ?

Best regards, Vivian

stanstrup commented 3 years ago

I would think that you'd need filterPolarity before peak-picking to make it work.

videlc commented 3 years ago

Do you mean when converting ? Sounds not so convenient. Isn't polarity= an option in xcms? It'd be convenient to get an extra column in chrompeaks with polarity.

jorainer commented 3 years ago

Very good point indeed @stanstrup ! xcms considers all spectra in the peak detection - it does not discriminate between pos and neg polarity. This might indeed be the reaason why peak detection fails for your data. So, as Jan suggests, you should split your data set by polarity:

mzdata_pos <- filterPolarity(mzdata, polarity = 1)
mzdata_neg <- filterPolarity(mzdata, polarity = 0)

and perform the peak detection separately for the the two

videlc commented 3 years ago

Yay, indeed super good point !


mzfile <- "LGD_DIA_peak-picked.mzML"
mzdata <- readMSData(mzfile, mode = "onDisk") 
mzdata <- filterPolarity(mzdata, polarity = 0)
mzdata <- findChromPeaks(mzdata, CentWaveParam(peakwidth = c(2, 30),ppm = 10,snthresh = 0,noise = 0))
xset_df<-data.frame(mzdata@msFeatureData$chromPeaks)
xset_df[which(xset_df$mz > 383.0835-10/1e6*383.0835 & xset_df$mz < 383.0835+10/1e6*383.0835),]
xset_df[which(xset_df$mz > 339.09266-10/1e6*339.09266 & xset_df$mz < 339.09266+10/1e6*339.09266),]

mzfile <- "LGD_DIA_peak-picked.mzML"
mzdata <- readMSData(mzfile, mode = "onDisk") 
mzdata <- filterPolarity(mzdata, polarity = 1)
mzdata <- findChromPeaks(mzdata, CentWaveParam(peakwidth = c(2, 30),ppm = 50,snthresh = 0,noise = 0))
xset_df<-data.frame(mzdata@msFeatureData$chromPeaks)
xset_df[which(xset_df$mz > 339.09266-10/1e6*339.09266 & xset_df$mz < 339.09266+10/1e6*339.09266),]
xset_df[which(xset_df$mz > 383.0835-10/1e6*383.0835 & xset_df$mz < 383.0835+10/1e6*383.0835),]

produces :

While this is a clear solution. I think the issue is closable. However, shouldn't it be automatic (i.e. peak searching each polarity at a time). I don't see why one would consider any m/z that can be detected in both polarities considering the different adducts.

sneumann commented 3 years ago

Hi, only slightly related to the actual issue here: Orbitrap is a slower scanner than a QTOF (but can switch polarities !). I would be interested whether that it is really the case without any loss in sensitivity. To test, you'd use the same method as your pos(1)/neg(2) setup above, but "change" to pos(1)/pos(2). Then you could compare the two pos(1) parts. My guess would be that it's better than on a TOF, but still some difference in signal. Curious to hear whether that's indeed the case. Yours, Steffen

videlc commented 3 years ago

I really don't know which would be better, only got a littleexperience with qtofs. However, that exact same sample was run on an impact II in a FullMS/DDA approach as well with much fewer significant hits (i.e. fewer in-vitro generated metabolites detected). However, I would not conclude too fast by saying that pos/neg switching is more sensitive.

From our experience, one clear benefit of pos/neg switching (aside from higher throughput) is that mass spec is less subject to charge effect because it is discharging every time the polarity switches. This is beneficial in terms of sensitivity.

videlc commented 3 years ago

To add to my previous comment @sneumann, I'll do some testing and give you a feedback based on your suggestion. If I undestood clearly, running the same sample with: (1) FullMS(pos)/FullMS(neg) (2) FullMS(pos)/FullMS(pos) and compare various ions with the pos(1) scan.

Thanks anyways. Courious to hear what you think on the automatic polarity switching chrompeak definition without requiring the filterpolarity option. Best regards, Vivian

videlc commented 3 years ago

Here is the follow-up @sneumann . Same sample (5 µL plasma spe extract) injected twice with two methods : (1) fullMS pos (m/z 50-500) // fullMS neg (m/z 50-500) (2) fullMS pos (m/z 50-500) // fullMS pos(m/z 50-501) (501 to filter it out afterwards)

Converted using msconvert + peakpicking

require(xcms)

testdata_posneg<-c("pMeOH_POSNEG.mzML")
testdata_posneg <- readMSData(testdata_posneg, mode = "onDisk") 
testdata_posneg <- filterRt(testdata_posneg,c(400,550))
testdata_posneg <- filterPolarity(testdata_posneg, polarity = 1)
testdata_posneg <- findChromPeaks(testdata_posneg, CentWaveParam(peakwidth = c(2, 30),ppm = 10,snthresh = 5,noise = 100))
testdata_posneg<-data.frame(testdata_posneg@msFeatureData$chromPeaks)

testdata_posneg$sample<-"posneg"

testdata_pospos<-c("pMeOH_POSPOS.mzML")
testdata_pospos <- readMSData(testdata_pospos, mode = "onDisk") 
testdata_pospos <- filterRt(testdata_pospos,c(400,550))
testdata_pospos <- testdata_pospos[testdata_pospos@featureData@data$scanWindowUpperLimit==500,]
testdata_pospos <- filterPolarity(testdata_pospos, polarity = 1)
testdata_pospos <- findChromPeaks(testdata_pospos, CentWaveParam(peakwidth = c(2, 30),ppm = 10,snthresh = 5,noise = 100))
testdata_pospos<-data.frame(testdata_pospos@msFeatureData$chromPeaks)

testdata_pospos$sample<-"pospos"

comp<-rbind(testdata_pospos,testdata_posneg)

require(dplyr)

comp %>% group_by(sample) %>% summarise(n=n())

Produces :

More peaks in pospos.

Common peaks area comparison neg vs pos:

comp<-data.frame(mz=testdata_posneg$mz,rt=testdata_posneg$rt,into=testdata_posneg$into,into=testdata_posneg$into)

comp$rt_pospos<-0
comp$mz_pospos<-0
comp$into_pospos<-0
comp$into_pospos<-0

for(n in 1:nrow(comp)){

  comp_mz<-comp$mz[n]
  comp_rt<-comp$rt[n]

  temp<-testdata_pospos[which(testdata_pospos$mz > comp_mz-5*comp_mz/1e6 & 
                                testdata_pospos$mz < comp_mz+5*comp_mz/1e6 &
                                testdata_pospos$rt > comp_rt-5 & 
                                testdata_pospos$rt < comp_rt+5),]
  temp$rtdiff<-abs(temp$rt-comp_rt)
  temp<-temp[which.min(temp$rtdiff),]

  if(nrow(temp)==1){
  comp$rt_pospos[n]<-temp$rt
  comp$mz_pospos[n]<-temp$mz
  comp$into_pospos[n]<-temp$into
  comp$into_pospos[n]<-temp$into
  print("found")
  }else{print('not found')}

}

require(ggplot2)

ggplot(data=comp,aes(x=into,y=into_pospos))+
  geom_point(alpha=0.2)+
  geom_abline(intercept = 0,slope = 1)+
  scale_x_log10()+
  scale_y_log10()

Produces :

Area tend to be higher in posneg and peaks not found in pospos have area in posneg within e5-e6 range (low intensity)

Reverse comparison pos vs neg :


comp<-data.frame(mz=testdata_pospos$mz,rt=testdata_pospos$rt,into=testdata_pospos$into,into=testdata_pospos$into)
comp$rt_posneg<-0
comp$mz_posneg<-0
comp$into_posneg<-0
comp$into_posneg<-0

for(n in 1:nrow(comp)){

  comp_mz<-comp$mz[n]
  comp_rt<-comp$rt[n]

  temp<-testdata_posneg[which(testdata_posneg$mz > comp_mz-5*comp_mz/1e6 & 
                                testdata_posneg$mz < comp_mz+5*comp_mz/1e6 &
                                testdata_posneg$rt > comp_rt-5 & 
                                testdata_posneg$rt < comp_rt+5),]
  temp$rtdiff<-abs(temp$rt-comp_rt)
  temp<-temp[which.min(temp$rtdiff),]

  if(nrow(temp)==1){
    comp$rt_posneg[n]<-temp$rt
    comp$mz_posneg[n]<-temp$mz
    comp$into_posneg[n]<-temp$into
    comp$into_posneg[n]<-temp$into
    print("found")
  }else{print('not found')}

}

require(ggplot2)

ggplot(data=comp,aes(x=into,y=into_posneg))+
  geom_point(alpha=0.2)+
  geom_abline(intercept = 0,slope = 1)+
  scale_x_log10()+
  scale_y_log10()

Produces:

Same area profile with posneg being higher than pospos for common peaks. Pospos only detected peaks are more broadly distributed.

looking at the lowest peak in each chrompeaks :

> testdata_posneg[c(which.min(testdata_posneg$into)) ,]
            mz    mzmin    mzmax      rt    rtmin    rtmax     into     intb     maxo sn sample
CP123 280.1611 280.1608 280.1613 504.102 502.2621 506.8588 82860.25 81077.57 27060.73 10 posneg
> 
> testdata_pospos[c(which.min(testdata_pospos$into)) ,]
             mz    mzmin    mzmax       rt    rtmin    rtmax     into     intb     maxo    sn
CP0059 192.1742 192.1741 192.1743 419.9922 419.9922 420.7957 12052.23 12051.43 18949.12 27189
       sample
CP0059 pospos

Posneg lowest is not detected in pospos and pospos lowest is seen (not detected by xcms) in posneg. Don't know what to conclude about this since i think the fewer datapoints per peaks in posneg (slow polarity switch) is affecting xcms peak detection.

Best regards, Vivian

sneumann / xcms

xcmsSet failing at finding peaks (in polarity switching runs ?) #533