topdownproteomics / sdk

Software solution for common top-down proteomics tasks
http://www.topdownproteomics.org/
MIT License
9 stars 4 forks source link

PTM likelihoods and scores #44

Open acesnik opened 6 years ago

acesnik commented 6 years ago

I'm reading this MoFi paper right now, and it's reminding me of @veitveit's concern that we need to allow room for scores in PTMs.

I think we might need to formalize a solution for at least two categories of scores:

  1. likelihoods of localization
  1. likelihood of different modifications at the same position accounting for the same mass difference

Can anyone think of more categories of scores?

acesnik commented 6 years ago

In an attempt at 2), I suppose the ambiguity grouping works nicely even for the same position: PRO[A2G0F|#glycan:70][A2G1F|#glycan:30]TEOFORM

And with two positions, that is also possible, where the probability for the glycan is 70/30 at each site and the position likelihood is 80/20: PRON[A2G0F|#glycanA:70|#position:40][A2G1F|#glycanA:30|#position:40]TEOFORN[A2G0F|#glycanB:70|#position:10][A2G1F|#glycanB:30|#position:10]

acesnik commented 6 years ago

I'm actually confused by my own follow-up statement from last week. The # mark would usually be used for the same modification at several locations, indicating it could be at one of the set. The #position mark breaks that assumption, since it is used with a couple types of modifications, as does #glycanA and #glycanB.