sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
178 stars 81 forks source link

[Question] findChromPeaks input parameters. #598

Closed LaneMatthewJ closed 2 years ago

LaneMatthewJ commented 2 years ago

When using the findChromPeaks function, I get wildly different results concerning the input to the function.

When calling findChromPeaks on a chromatogram with the MatchedFilterParam, I wind up with a relatively reasonable amount of extracted peaks:

data_prof <- readMSData(file=filePaths, mode='onDisk', centroided=FALSE, pdata = new("NAnnotatedDataFrame", pd))
cropped <- filterRt(data_prof, c(375, 1400))
matchedFilterParam <- MatchedFilterParam( fwhm= 1.25, snthresh=0.25, max=500)
xdata_ChromatogramInput <- findChromPeaks(chromatogram(cropped), matchedFilterParam)
xdata_ChromatogramInput
| XChromatograms with 1 row and 2 columns
|                    1               2
|      <XChromatogram> <XChromatogram>
| [1,]      peaks: 107      peaks: 109
| phenoData with 2 variables
| featureData with 1 variables
| \- \- \- xcms preprocessing - - -
| Chromatographic peak detection:
|  method: matchedFilter 

The above method yields some good looking peaks when zoomed into a subsection of the chromatogram (done similarly to the methods in the vignettes).
snth_narrow_search_fwhm1p25_76

However, when passing in the only the raw data (i.e. cropped instead of chromatogram(cropped) ) as is done in the vignette, I wind up receiving a significantly larger amount of peaks.

xdata_rawInput <- findChromPeaks(cropped, matchedFilterParam)
chromatogram(xdata_rawInput)
|  XChromatograms with 1 row and 2 columns
|                     1               2
|       <XChromatogram> <XChromatogram>
|  [1,]   peaks: 134642   peaks: 137581
|  phenoData with 2 variables
|  featureData with 1 variables
|  \- \- \- xcms preprocessing - - -
|  Chromatographic peak detection:
|   method: matchedFilter 

Is there a reason that I'm missing as to why these are so wildly different? (My suspicion is that the "peaks" found are each individual peak within the EIC at each rt slice, but I also want to be sure I'm not doing anything wildly wrong*) Thank you so much!

jorainer commented 2 years ago

Essentially, you're running the peak detection on two completely different data sets. The chromatogram call as you use it above will return a total ion chromatogram of your data (in the top of the plot you have it also shows 50.1 - 643.2, which means that the signal in each spectrum was summed up in that m/z range). The findChromPeaks(cropped call on the other hand first identifies m/z-rt slices (the so called ROI) in which the peak detection is performed.

So, summarizing, you get different results because the input data represents something completely different. Calling chromatogram without specifying an m/z range will reduce the data to a total ion chromatogram, i.e. you loose all information in the m/z dimension. The standard way to use findChromPeaks is to call it on the full data, i.e. the cropped variable in your case. If you want to first test the peak detection settings you can (and should) use chromatogram but only to extract an EIC representing the trace of an individual ion (by specifying a narrow m/z range that should contain the signal of the ion of a certain compound). Hope I explained it well.

LaneMatthewJ commented 2 years ago

You explained it perfectly! Thank you! That's what I was assuming was happening, though I do have a quick follow up question:

Since calling findChromPeaks on the chromatogram searches only the TIC, would honing the parameter variables for fwhm and snthresh based off of the TIC based output even be viable? They're definitely related, but now I'm wondering that possibly my values are far too low.

Thank you so much for all of your help!

jorainer commented 2 years ago

I would rather define the parameters on an EIC of a single compound. Note also that with MatchedFilterParam you expect all your peaks to have the same width (in retention time dimension) - CentWaveParam would allow to detect peaks with different widths.

LaneMatthewJ commented 2 years ago

That makes complete sense. As for using the MatchedFilterParam, I'm working with GCMS output data. I could've sworn XCMS had a GCMS vignette suggesting that matched filter was the best option, but I'm apparently mistaken in that I don't see it anywhere. Regardless, most of our peaks are similar in size, but I'm curious is centwave would be a better option.

After extracting the fwhm for all of our known peak data, there definitely appears to exist some variance. I'm embarrassingly a neophyte grad student here and am relatively new to the world of metabolomics - is Centwave a reasonable method for peak picking when it comes to GCMS? The cited publications I've seen have primarily been for LCMS based input.

Originally, as for picking the best options for parameters, instead of applying defaults I created combinations of all values within a reasonable range of the fwhms and snthresh and created visualizations based off of the found peaks within the TIC by lowering the snthresh to 0.01 (ultimately trying to get the most peaks possible for a metabolomic profile). Clearly doing that based off of the TIC instead of the entire chromatogram was less than ideal.

jorainer commented 2 years ago

I'm not so familiar with GC-MS, but AFAIK the MatchedFilterParam should work perfectly fine for that type of data.

I'm closing now the issue - feel free to re-open if needed.