smith-chem-wisc / MetaMorpheus

Proteomics search software with integrated calibration, PTM discovery, bottom-up, top-down and LFQ capabilities
MIT License
90 stars 46 forks source link

Proteoform parsimony #33

Open stefanks opened 7 years ago

trishorts commented 7 years ago

I think this issue should be deleted. This should not be a function for metamorpheous. It is a function for PS. I can see this added back if we do dual bottom up/tod-down runs

stefanks commented 7 years ago

I wouldn't discard this just yet, even for bottom-up! Consider a protein with unmodified sequence abc where a, b, and c are some amino acid seqeunces. Say that uniprot says that these could be modified into A, B, and C respectively. Then, say we identify a, b, c, and A. We then could, with reasonable certainty say we observed proteoforms abc and Abc.

rmillikin commented 7 years ago

This can happen very soon with the right workflow: Bottom-up/GPTMD -> top-down to confirm specific proteoforms -> narrow list of GPTMD-assigned PTMs to those proteoforms seen by top-down -> bottom-up with the new restricted database and "treat modified peptides as unique" checked

But narrowing the list of GPTMD-assigned PTMs should probably happen in PS, I agree

stefanks commented 7 years ago

Almost! You also need to distinguish modified proteins from each other

rmillikin commented 7 years ago

Right, you would need a separate database entry for each proteoform

stefanks commented 7 years ago

That's one way. But even without that, look at my example above. That does not require prior knowledge of the proteoforms!

trishorts commented 7 years ago

image problematic

stefanks commented 7 years ago

Yes, there are very many cases where it becomes problematic. But if there are even a few that can be done, why not have it?

Besides, in this case, all of those would simply get bunched up together in one proteoform ambiguity group.

stefanks commented 7 years ago

And what about this: Say on peptide C there could have been two Phosphorylations. And say we observed an unmodified C and a C modified with two phosphos. In this case we get some "proteoform-level" information, in a sense that we know that either both phosphorylations are present, or neither one is present.

leahvschaffer commented 7 years ago

With sequence coverage what it is I wouldn't want to assume that we have a proteoform with that one mod even if we only saw one modified peptide from that protein... could easily be another mod or a truncated protein. I've found modified peptides useful when we have the intact mass -- then you can say you saw the modified peptide and know that the intact mass exists with that one mod. Makes more sense to do this proteoformsuite, which at this point in time can already match up peptides with corresponding theoretical based on accessions.

trishorts commented 7 years ago

This issue in particular will benefit from using mutliple protease data in a single analysis.