rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
34 stars 24 forks source link

Add fusePeaks function #292

Closed jorainer closed 1 year ago

jorainer commented 1 year ago

Add a function that allows to fuse/combine (aggregate?) peaks within each spectrum. The need (usability) of such a function comes from https://github.com/rformassspectrometry/MsCoreUtils/issues/116 . Basically, the idea is to fuse all peaks with an m/z smaller than a threshold into a single peak by combining/fusing their intensities and m/z values.

This is in contrast to the reducePeaks that reports for each peak group only a single (representative) peak. This function is supposed to combine the signals of the peaks within the peak group.

A definition could be:

fusePeaks <- function(x, ppm = 20, tolerance = 10, intensityFun = max, mzFun = mean, weighted = TRUE)

intensityFun allows to specify the function to combine the intensity values. mzFun how m/z values should be fused into a single value. If weighted = TRUE an intensity weighted mean will be used, regardless of mzFun (which would only be considered if weighted = FALSE.

Possible names would be:

lgatto commented 1 year ago

I personally prefer combinePeaks() (definitely not combineSpectra(), for the same reason you mention in #291, that the function operates on peaks), but could live with fusePeaks().

jorainer commented 1 year ago

Unfortunately we have already a combinePeaks function implemented. That one is called by combineSpectra and its purpose is to combine peaks from different spectra, e.g. 4 spectra (and their peak matrices) are supposed to be combined into a single spectrum (and peak matrix).

We can of course have an additional combinePeaks,Spectra method, that does something slightly different (i.e. combine peaks within each spectrum), but we need to clearly document these differences.

lgatto commented 1 year ago

So may be the existing one should rather be called combineSpectra() - this would be a confusing change, but would lead to the cleanest option.

jorainer commented 1 year ago

Sorry, I was not explaining well:

combineSpectra is a function to combine subsets of a Spectra into single spectra (parameter f defines which spectra in Spectra should be combined). The function has a parameter FUN that allows to pass a function to merge the peak matrices of the sets of spectra. The default is FUN = combinePeaks. This combinePeaks function takes a list of peak matrices and combines them into a single peak matrix.

We could now have a combinePeaks,list method that uses the original code (for the use case above) and a combinePeaks,Spectra for the new use case (i.e. merge peaks within each spectra). In theory, we can re-use part of the code, since we're now in essence combining peaks on a single peak matrix. So the combinePeaks,list would be the special case.

jorainer commented 1 year ago

Before thinking of an implementation - just checking if the following definition would not be too confusing for the user:

We could also think of an alternative name of the function if using the same name (combinePeaks) is confusing. Maybe combinePeaksPerSpectrum or combinePeaksInSpectrum. Happy for feedback @lgatto ...

jorainer commented 1 year ago

Or - like you suggest, we rename the old combinePeaks into e.g. combinePeaksData and use combinePeaks for the new functionaltity (making it a method and forwarding combinePeaks,list to combinePeaksData. That should be the cleanest solution since peaksData is in fact a list of peak matrices, so combinePeaksData would make more sense for the old functionality anyway.

lgatto commented 1 year ago

👍🏻

jorainer commented 1 year ago

Implemented.