Closed lgatto closed 2 years ago
Yes, let's generalise the name - I'm happy to hear this would also be useful for metabolomics data. But I'm not convinced by annotation. In proteomics, you could have an annotation that doesn't equate to a peptide sequence (i.e. the identification) - for example the presence of an ammonium ion.
What about using the term identification rather than sequence, which is just as good or a match in proteomics and seems to also fit metabolomics.
If you are happy with this, I would rename the function countIdentifications()
.
Yes. identification
is OK for me.
@jorainer - should we merge the two accepted PRs despite of the errors (as noted here)
Yes, I would do so. I will merge and ensure that in the other PR news/versions etc get updated too.
I will also add an intermediate fix until mzR
is (finally - might still need some time) updated.
And what about pushing to Bioc? Go ahead or do we need to wait?
I will then push Spectra to BioC (once we have both PR in)
Done.
Hi @sgibb and @jorainer
This draft PR add a new function,
nSequences()
that takes a proteomics Spectra object with identification results (as asequence
spectra variable). It then counts, for each scan (MS1 and MS2) the number of identification that results from that scan. This is obvious and uninteresting for MS2 scans, as these are either 0 or one (depending whethersequence
is NA or not). But for MS1 scan, the function could how many identifications where found for in the descendant MS2 scans. This can in turn be used to generate a plot as shown belowIn addition to your review, I have some questions/comments:
nSequences.R
..filterSpectraHierarchy()
is slow for such a use-case, where it is called for all MS1 scans. I can think of a vectorised and/or a C-level implementation. What do you think?