rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
34 stars 24 forks source link

Suggesion for combinePeaksData #320

Closed WallFacerLR closed 1 month ago

WallFacerLR commented 1 month ago

In the description of combinePeaksData: combinePeaksData aggregates provided peak matrices into a single peak matrix. Peaks are grouped by their m/z values with the group() function from the MsCoreUtils package. In brief, all peaks in all provided spectra are first ordered by their m/z and consecutively grouped into one group if the (pairwise) difference between them is smaller than specified with parameter tolerance and ppm.

In many cases, there are lower magnitude noise around a signal peak within the allowable ppm range. Aggregation functions intensityFun take noise and signal from different spectra equally. But actually we hope just signal from different spectra be aggregated.

There is one case of intermediate data of one mz-group in combinePeaksData, ppm = 10.

        mzs       ints sp
1  78.27686  2452.415  1
2  78.27757  8647.451  1
3  78.27821 39550.324  1
4  78.27616  1716.824  2
5  78.27757 11083.815  2
6  78.27820 43089.883  2
7  78.27685  2501.988  3
8  78.27757 11999.393  3
9  78.27819 46866.879  3
10 78.27609  2201.728  4
11 78.27688  3311.003  4
12 78.27757  9960.227  4
13 78.27818 51152.867  4
14 78.27612  3122.156  5
15 78.27754 10740.626  5
16 78.27817 52989.051  5
17 78.27610  3995.448  6
18 78.27754 11588.479  6
19 78.27818 61423.824  6
20 78.27809 18063.707  7

In this case, we want select the maximum peak for each spectra and then apply intensityFun. lower magnitude noise will reduce the aggregated intensity to a great extent. Simply stat, this problem exist in 7/52 mz-groups of this case (lengths(mzs) > length(sp), this mean noise are involved)

This problem may be improved by more suitable ppm, but maybe a aggregation procedure inside one spectra before aggregation between spectra is more useful in some situation.

jorainer commented 1 month ago

Good point @WallFacerLR . What I would suggest instead (of changing the combinePeaksData() function) is to run reduceSpectra() before aggregating the spectra with combineSpectra() (which will then use the combinePeaksData() function). The reduceSpectra() function will do exactly what you suggest: it will group peaks within each spectrum and then report only the mass peak with the highest intensity for each group. You could then either extract the list of peaks data using peaksData() from the Spectra object and run the combinePeaksData() manually, or simply call combineSpectra() on the Spectra object. So, summarizing, the workflow you suggest could be:

sps |>
    reduceSpectra(tolerance = 0, ppm = 10) |>
    combineSpectra(tolerance = 0, ppm = 10)

you would obviously need to check that the default for parameter f in combineSpectra() is the same as the sets of spectra you would like to combine - or otherwise provide the sets of spectra you want to combine with parameter f.

WallFacerLR commented 1 month ago

Thanks for reminding me this function. It's actually a complete job. Maybe this should be noticed in doc of combineSpectra or combinePeaksData, causing it's necessary for aggregating multiple spectra. And as user, when I read the documentation of these function and vignette of section 3.6 Aggregating spectra data, I‘m not aware of this problem and above solution.

jorainer commented 1 month ago

Yes, good point! I will update the documentation accordingly!

jorainer commented 1 month ago

Updated the documentation of the current (release) version. Closing issue now - feel free to re-open if needed.