smith-chem-wisc / MetaMorpheus

Proteomics search software with integrated calibration, PTM discovery, bottom-up, top-down and LFQ capabilities
MIT License
91 stars 46 forks source link

Explanation of the FdrCategory Enum in the Engine Layer which has always baffled me. #2402

Closed trishorts closed 3 months ago

trishorts commented 3 months ago

This enum is used to categorize the FDR of a peptide based on its cleavage specificity. FullySpecific: The peptide is cleaved only at protease-specified cleavage sites. SemiSpecific: The peptide is cleaved on one terminus at protease-specified cleavage sites and at non-specific site on the other terminus. NonSpecific: The peptide is cleaved at non-specific sites on both termini.

In the Speedy Non-Specific Search use case, all three categories are used with modern search. For each spectrum, the lowest q-value peptide is chosen rather than the highest scoring peptide.

In a classic NonSpecific search, I believe that only the NonSpecific category is used. Further, I believe that it includes peptides that are cleaved at one or more protease-specified cleavage sites, but also at non-specific sites.

The Single-N or Single-C protease is a special case. The modern search table is populated only with peptide fragments including the specified terminus. Fragments from the other terminus are not included.

This is not the same as Semi-Trypsin, which is a classic search where the protein is digested into peptides and then the database is further updated the full set of peptides that could be generated by terminal degradation.