rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
34 stars 24 forks source link

New countIdentifications() function #222

Closed lgatto closed 2 years ago

lgatto commented 2 years ago

Hi @sgibb and @jorainer

This draft PR add a new function, nSequences() that takes a proteomics Spectra object with identification results (as a sequence spectra variable). It then counts, for each scan (MS1 and MS2) the number of identification that results from that scan. This is obvious and uninteresting for MS2 scans, as these are either 0 or one (depending whether sequence is NA or not). But for MS1 scan, the function could how many identifications where found for in the descendant MS2 scans. This can in turn be used to generate a plot as shown below

image

In addition to your review, I have some questions/comments:

  1. Ok to add this function in it's own file nSequences.R.
  2. I'm not convinced by the name of the function.
  3. The function is quite slow. How could it be improved? I suspect that the current implementation of .filterSpectraHierarchy() is slow for such a use-case, where it is called for all MS1 scans. I can think of a vectorised and/or a C-level implementation. What do you think?
lgatto commented 2 years ago

Yes, let's generalise the name - I'm happy to hear this would also be useful for metabolomics data. But I'm not convinced by annotation. In proteomics, you could have an annotation that doesn't equate to a peptide sequence (i.e. the identification) - for example the presence of an ammonium ion.

What about using the term identification rather than sequence, which is just as good or a match in proteomics and seems to also fit metabolomics.

If you are happy with this, I would rename the function countIdentifications().

jorainer commented 2 years ago

Yes. identification is OK for me.

lgatto commented 2 years ago

@jorainer - should we merge the two accepted PRs despite of the errors (as noted here)

jorainer commented 2 years ago

Yes, I would do so. I will merge and ensure that in the other PR news/versions etc get updated too.

jorainer commented 2 years ago

I will also add an intermediate fix until mzR is (finally - might still need some time) updated.

lgatto commented 2 years ago

And what about pushing to Bioc? Go ahead or do we need to wait?

jorainer commented 2 years ago

I will then push Spectra to BioC (once we have both PR in)

jorainer commented 2 years ago

Done.