rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
34 stars 24 forks source link

Add a filterPrecursorPeaks function #293

Closed jorainer closed 10 months ago

jorainer commented 1 year ago

Add a function filterPrecursorPeaks that removes peaks with m/z matching the precursor m/z from (fragment) spectra. An additional parameter remove could be used to specify which peaks should be removed: only matching peaks "==", precursor peaks and all above ">=", only those above ">", precursor peaks and all below "<=", only those below "<". The function definition would be:

filterPrecursorPeaks <- function(x, ppm = 20, tolerance = 0, remove = c("==", ">=", ">", "<=", "<"))
lgatto commented 1 year ago

So this a convenience function that is similar to filterMzValues() but that operates on MS levels > 1 (i.e. those that have a precursor) and automatically considers uses the precursor m/z. This is indeed useful, but I am wondering about the many remove options other than ==. What are the use cases?

Coming back to the existing filterMzValues() and filterMzRange(). If I remember well, these are provided with fixed values or ranges, that are applied to all spectra of a given MS level. Wouldn't want a way to be able to pass different MZ values/ranges for different scans, which would then also address the need for a new filterPrecursorPeaks()?

jorainer commented 1 year ago

Use cases for situations other than "==" (which removes the precursor peak itself from the fragment spectrum): ">=" for small molecule annotations this becomes more and more of an option (e.g. the spectral entropy similarity score does this by default): remove the precursor peak and all peaks with an m/z larger than the precursor from the spectrum (for single charged molecules it's not clear what these larger peaks actually are). The ">" would then keep the precursor and simply remove all peaks larger than the precursor. This, along with "<" and "<=" are just to provide all possibilities.

And yes, I will have a look into filterMzRange - maybe we can find a way to (re)use that function also for this use case.

jorainer commented 1 year ago

I had a look into the filterMzRange and it is not (easily) possible to include also the precursor m/z of each spectrum into the m/z range to filter each spectrum individually.

Also, from the user perspective I believe it might be easier to understand (and find) to have a filterPrecursorPeaks function since the task is to remove potential precursor peaks from each spectrum (or precursor peaks and all peaks with an m/z >= the precursor m/z. So, I would go for the original option.

jorainer commented 1 year ago

In essence, what we want to do here is to filter peaks within each spectrum with m/z values relative to the precursor. The most common use cases will be

For the name of the function, we could also use

lgatto commented 1 year ago

OK for the original option.