Closed stanstrup closed 2 years ago
What you describe, is this happening during the integration of the detected peak (where the "mz"
, "rt"
and "into"
are calculated)?
The mzCenterFun
parameter of CentWaveParam
should actually allow you to define how the m/z of the peak is calculated. The default is wMean
which should be an intensity-weighted mean. Maybe you can test different options (e.g. "apex"
, "mean"
) to see what changes?
mzCenterFun
defines how the mass is chosen/calculated between the scans. The issues seems to be that if there are multiple m/z peaks inside the same scan (and inside the ROI) it is the simple mean that is used (for the m/z, the intensities are summed), supposedly before any integration.
It is unclear to me when this happens.
An example: I have a compound with theoretical mass: 522.3554 I create a ROI in a 200 ppm window, i.e. 522.2505 - 522.4595.
In the apex scan I have peaks at:
mz int
522.3558 3e4
522.4548 2e1
What I get in the peak table with mzCenterFun = "apex"
is mz = 522.4053 (= this is the simple mean of the two peaks in the scan). So 95 ppm off even if the main peak is spot on. It is distorted by the small noise peak.
The weighted mean would have been 522.3559 which is just 1 ppm off.
As I said I cannot figure out where this summing/meaning is happening but I guess it is hardcoded somewhere. And with normal data and reasonably narrow ROIs this rarely matters. A case where it does matter would actually be the orbitrap data we just made the filter for. It is the same issue. Multiple peaks in the same scan within the ROI/ppm parameters. If we used the weighted mean there we might not even need the filter.
OK, thanks for checking that. I will dig a little more into the code.
Can you please in addition also try the followin option - just to see what/if something changes:
options(originalCentWave = FALSE)
and then run the CentWave peak detection?
Might be that I need to adapt/update some C functions :(
Can you please in addition also try the followin option - just to see what/if something changes:
options(originalCentWave = FALSE)
and then run the CentWave peak detection?
I did. Same result.
Yeah I was wondering too if it was hiding in some EIC generation step...
OK - so I'll have to dig into the (undocumented) C-code.
During peak detection, centWave extracts the m/z for the peaks in the ROI here . The peak's m/z is then calculated based on these.
The getMZ
function is defined here and in this line the function calculates the mean m/z if the ROI contains for one scan (spectrum) more peaks (as I assume is true for your case).
Yes! that must be it. It is here I suggest an intensity weighted average would be much better.
Or simply report the m/z of the maximum intensity peak? Would be computationally less intense.
Yes that should work OK as well.
I've implemented an intensity weighted average m/z calculation in the fix_peak_mz branch. Can you please check if that works for your data?
Remembered to push? ;)
Hm, yes, I did. You should be able to install with BiocManager::install("sneumann/xcms", ref = "fix_peak_mz")
.
Ah, sorry, you're right. I forgot to push. Now it's updated.
Hi @stanstrup - did you have the chance to try it? On my data the m/z of the identified chromatographic peaks slightly change (improve?).
This should now be fixed (version >= 3.17.5). Closing the issue now, feel free to re-open if needed.
I encountered an issue that happens when more than one m/z value is inside the ROI for a scan. It seems that what XCMS does is take the average m/z value and sum the intensities of all peaks inside the ROI. This can lead to wildly off m/z values when the ROI contains noise.
My use case is that I am targeting certain peaks (by pre-calculating ROIs) for QC purposes and since we had mass stability issues I need to put a pretty wide m/z range to be sure to actually find the peak. But this cases also some tiny noise peaks to be included and the mass returned is then way off.
I could not find the code that actually does this but I calculated by hand that the above is what happens. What I would suggest is to combine the m/z values by intensity weighted mean instead (like it can be done across the peak). I think this might also be a more gently solution to the orbitrap shoulder issues than the filter we just implemented in Spectra...