smith-chem-wisc / MetaMorpheus

Proteomics search software with integrated calibration, PTM discovery, bottom-up, top-down and LFQ capabilities
MIT License
90 stars 46 forks source link

Unintuitive proteoform/protein output #2061

Open zrolfs opened 3 years ago

zrolfs commented 3 years ago

image

I made a file that contained 8 PrSMs at each of the proteoform ambiguity classification levels (1, 2A, 2B, 2C, 2D, 3, 4, and 5). 8 PrSMs were reported in the PrSM output. 3 proteoforms were reported in the proteoform output. 6 proteins were reported in the ProteinGroup output.

It's a little weird to me that we only reported 3 unique proteoforms, even though we identified 8 unique proteoforms. Stranger still is the ability to identify 6 unique protein groups from only 3 unique proteoforms.

The reason for this is because the Peptide/Proteoform output requires an unambiguous full sequence and the ProteinGroup output requires an unambiguous base sequence for parsimony.

I'm not sure how pressing this issue is (or if it's even an issue), but it doesn't look like a quick fix.

zrolfs commented 3 years ago

I'm thinking about proteoform parsimony... Saw we identified two PrSMs: A) PROTEOFORM (with unlocalized +16 mass shift) B) PROTEOFORM(Ox)

We should output a single proteoform "PROTEOFORM(Ox)" for the two, rather than reporting both.