sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
178 stars 81 forks source link

when more than one m/z is inside the ROI the m/z can be way off #590

Closed stanstrup closed 2 years ago

stanstrup commented 2 years ago

I encountered an issue that happens when more than one m/z value is inside the ROI for a scan. It seems that what XCMS does is take the average m/z value and sum the intensities of all peaks inside the ROI. This can lead to wildly off m/z values when the ROI contains noise.

My use case is that I am targeting certain peaks (by pre-calculating ROIs) for QC purposes and since we had mass stability issues I need to put a pretty wide m/z range to be sure to actually find the peak. But this cases also some tiny noise peaks to be included and the mass returned is then way off.

I could not find the code that actually does this but I calculated by hand that the above is what happens. What I would suggest is to combine the m/z values by intensity weighted mean instead (like it can be done across the peak). I think this might also be a more gently solution to the orbitrap shoulder issues than the filter we just implemented in Spectra...

jorainer commented 2 years ago

What you describe, is this happening during the integration of the detected peak (where the "mz", "rt" and "into" are calculated)?

The mzCenterFun parameter of CentWaveParam should actually allow you to define how the m/z of the peak is calculated. The default is wMean which should be an intensity-weighted mean. Maybe you can test different options (e.g. "apex", "mean") to see what changes?

stanstrup commented 2 years ago

mzCenterFun defines how the mass is chosen/calculated between the scans. The issues seems to be that if there are multiple m/z peaks inside the same scan (and inside the ROI) it is the simple mean that is used (for the m/z, the intensities are summed), supposedly before any integration. It is unclear to me when this happens.

An example: I have a compound with theoretical mass: 522.3554 I create a ROI in a 200 ppm window, i.e. 522.2505 - 522.4595.

In the apex scan I have peaks at:

mz            int
522.3558      3e4
522.4548      2e1

What I get in the peak table with mzCenterFun = "apex" is mz = 522.4053 (= this is the simple mean of the two peaks in the scan). So 95 ppm off even if the main peak is spot on. It is distorted by the small noise peak. The weighted mean would have been 522.3559 which is just 1 ppm off.

As I said I cannot figure out where this summing/meaning is happening but I guess it is hardcoded somewhere. And with normal data and reasonably narrow ROIs this rarely matters. A case where it does matter would actually be the orbitrap data we just made the filter for. It is the same issue. Multiple peaks in the same scan within the ROI/ppm parameters. If we used the weighted mean there we might not even need the filter.

jorainer commented 2 years ago

OK, thanks for checking that. I will dig a little more into the code.

jorainer commented 2 years ago

Can you please in addition also try the followin option - just to see what/if something changes:

options(originalCentWave = FALSE)

and then run the CentWave peak detection?

jorainer commented 2 years ago

Might be that I need to adapt/update some C functions :(

stanstrup commented 2 years ago

Can you please in addition also try the followin option - just to see what/if something changes:

options(originalCentWave = FALSE)

and then run the CentWave peak detection?

I did. Same result.

Yeah I was wondering too if it was hiding in some EIC generation step...

jorainer commented 2 years ago

OK - so I'll have to dig into the (undocumented) C-code.

jorainer commented 2 years ago

During peak detection, centWave extracts the m/z for the peaks in the ROI here . The peak's m/z is then calculated based on these.

The getMZ function is defined here and in this line the function calculates the mean m/z if the ROI contains for one scan (spectrum) more peaks (as I assume is true for your case).

stanstrup commented 2 years ago

Yes! that must be it. It is here I suggest an intensity weighted average would be much better.

jorainer commented 2 years ago

Or simply report the m/z of the maximum intensity peak? Would be computationally less intense.

stanstrup commented 2 years ago

Yes that should work OK as well.

jorainer commented 2 years ago

I've implemented an intensity weighted average m/z calculation in the fix_peak_mz branch. Can you please check if that works for your data?

stanstrup commented 2 years ago

Remembered to push? ;)

jorainer commented 2 years ago

Hm, yes, I did. You should be able to install with BiocManager::install("sneumann/xcms", ref = "fix_peak_mz").

jorainer commented 2 years ago

Ah, sorry, you're right. I forgot to push. Now it's updated.

jorainer commented 2 years ago

Hi @stanstrup - did you have the chance to try it? On my data the m/z of the identified chromatographic peaks slightly change (improve?).

jorainer commented 2 years ago

This should now be fixed (version >= 3.17.5). Closing the issue now, feel free to re-open if needed.