PTM likelihoods and scores

acesnik commented 6 years ago

I'm reading this MoFi paper right now, and it's reminding me of @veitveit's concern that we need to allow room for scores in PTMs.

I think we might need to formalize a solution for at least two categories of scores:

likelihoods of localization

Perhaps PROT[Phospho|#mod:20]EOFORMS[Phospho|#mod:80], as noted in https://github.com/topdownproteomics/sdk/issues/17

likelihood of different modifications at the same position accounting for the same mass difference

We have come upon this problem in Proteoform Suite, and they've come upon it in this paper.
In the text they give the example (with glycans): "For instance, assume that a given residual mass may be compatible with the glycoforms A2G0F/A2G2F and A2G1F/A2G1F, whose scores are 0.7 and 0.3, respectively. Then, the former permutation will account for 70% of the peak abundance, while the latter one will explain the remaining 30%."
How should we note this type of likelihood for two different modifications at the same position?

Can anyone think of more categories of scores?

acesnik commented 6 years ago

In an attempt at 2), I suppose the ambiguity grouping works nicely even for the same position: PRO[A2G0F|#glycan:70][A2G1F|#glycan:30]TEOFORM

acesnik commented 6 years ago

I'm actually confused by my own follow-up statement from last week. The # mark would usually be used for the same modification at several locations, indicating it could be at one of the set. The #position mark breaks that assumption, since it is used with a couple types of modifications, as does #glycanA and #glycanB.

topdownproteomics / sdk

PTM likelihoods and scores #44