rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
37 stars 25 forks source link

Adding a filterPrecursorCharge() filter function? #192

Open lgatto opened 3 years ago

lgatto commented 3 years ago

@jorainer - any objection?

jorainer commented 3 years ago

Sounds good to me - will it filter on a range or a single value?

lgatto commented 3 years ago

Range would be better indeed.

jorainer commented 3 years ago

Then it's similar to the filterPrecursorMz. Maybe you could also add the generic to ProtGenerics then?

lgatto commented 3 years ago

Sure, will do.

lgatto commented 3 years ago

I have this implementation working:

> library(rpx)
> px <- PXDataset("PXD000001")
> f <- pxget(px, "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML")
Loading TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML from cache.
> library(Spectra)
> sp <- Spectra(f)
> table(msLevel(sp), precursorCharge(sp))

       2    3    4    5    6
  1    0    0    0    0    0
  2 3749 1955  368   25    6
> sp2 <- filterPrecursorCharge(sp, 2:3)
> table(msLevel(sp2), precursorCharge(sp2))

       2    3
  2 3749 1955

which however leads me to question the result, which also applies to filterPrecursorMz (and possibly others).

@jorainer We loose all MS1 spectra, given that we keep those that have a precursorCharge. Shouldn't we also keep all other MS levels?

jorainer commented 3 years ago

I think this depends on the main purpose/use case for the function. I use filterPrecursorMz mostly for spectra matching where I only want to compare the query spectrum against target (reference) spectra with matching precursor m/z. Thus I actually want to drop all MS1 spectra or those MS2 spectra without precursor m/z. To be consistent it would make sense to have the same behaviour for filterPrecursorCharge.

I think this is also the behaviour users might expect from such a function: if I want to filter by precursor, I will only get those for which the condition (precursor m/z or charge) matches - I might be confused if I get also MS1 spectra in return as they don't have a precursor. But that's only my opinion, happy to discuss.

lgatto commented 3 years ago

I agree with you, but this is trivial:

x %>%
   filterMsLevel(2L) %>%
   filterPrecursorMz(c(500, 501))

instead of

x %>%
   filterPrecursorMz(c(500, 501))

However, the other way is more convoluted (not tested)

x <- c(filterMsLevel(x, 1L), 
       filterPrecursorMz(x, c(500, 501)))

and would probably need some rearrangement according to rtime. I suppose there's another way around:

  1. filterPrecursorMz(x, c(500, 501))$spectrumId (or any similar primary key).
  2. And the use filterPrecusorScan()

My use case would be to I keep MS2 scans and have their precursor's scans, to check the precuror peak in its MS1 and look for contaminating ions in the selection window.

jorainer commented 3 years ago

Well, then, the other possibility is to add, like for filterMzValues, an additional parameter msLevel. which allows to specify on which MS level the filtering should be performed and retains all spectra from the other MS levels. But then we should also change this in filterPrecursorMz.

lgatto commented 3 years ago

I don't think your suggestion is very elegant, because filterPrecursor* are only applicable to MS2 scans (or MS3). So if the default value is 2, a user would need to specify 1:2 to get rid of MS1 scans, and if the default is uniqueMsLevel(), then a user would need to specify 2 to keep MS1 scans. I think this is a bit convoluted and implicit, as opposed to explicitly filtering on the MS levels.

In general terms, there are two possibilities:

  1. filter function should filter on their spectra variable and ignore those spectra that have an NA
  2. filter functions get a second argument to specify which MS level to act on and leave the other ones

I personally prefer the first option because

jorainer commented 3 years ago

Note that by adding the msLevel. parameter we would be in line with the other filter function we already have. It is implicit, yes, but we would then be consistent with the other functions. But I agree that I did not find this parameter very helpful in the first place and also got confused by it.

Still, I personally don't like option 1) because R in general does either drops NAs or returns NA for NA. I find returning all spectra that have a precursor m/z of NA misleading. To me, calling filter* means that everything which matches is returned and everything else (including NA) is dropped or NA is returned.

Could adding a parameter na.rm = FALSE be a compromise? With na.rm = FALSE those spectra with a precursor* of NA are returned, with na.rm = TRUE they are dropped?

lgatto commented 3 years ago

Could adding a parameter na.rm = FALSE be a compromise? With na.rm = FALSE those spectra with a precursor* of NA are returned, with na.rm = TRUE they are dropped?

That could be a good compromise indeed. With na.rm = FALSE being the default value? Let me think a bit more about it.

lgatto commented 3 years ago

Note that by adding the msLevel. parameter we would be in line with the other filter function we already have.

We have this parameter in filterIntensity(), replaceIntensitiesBelow(), filterMzRange(), filterMzValues(), filterRt(), bin(), pickPeaks() and smooth() (and others that I might have missed). But these don't deal with any of the precursor spectra variables missing for MS1 scans, can be applied on any MS level, and hence the msLevel. is important to control which ones to process. I think this is a different situation than the filterPrecursor*() cases and it fine as it is.