We should build a way to specify ambiguity of localization in ProForma:
Modifications localized to one of several amino acid options (e.g., a phosphorylation on a T, S, or Y in a proteoform)
Regions of ambiguity, such as an unidentified mass on a fragment
Ambiguous localization along a whole proteoform sequence (see also #21)
Proposal 1: Four new keys
Add four new keys to specify ambiguity:
#, noting one of several sites that may be assigned a single modification
->, noting the left boundary of a range of the sequence to which a modification may be localized
<-, noting the right boundary of such a range
<->, noting a modification has ambiguous localization along a whole proteoform sequence, used before the first amino acid of the sequence
The value of the key-value-pair is a unique string grouping the ambiguous localization sites
Examples:
PROT[Phospho|#:eg]EOS[#:eg]FORMS[#:eg]
This sequence has a phosphorylation with ambiguous localization on either T4 or S12.
Note S7 is excluded from this group, e.g., by an identified internal fragment.
PROT[mass:19|->:A]EOSFORMS[<-:A]
This sequence has a modification with ambiguous localization across a range.
Note S7 is included in this group.
The A values of the key-value pair groups the tags. This allows overlapping regions to be disambiguated.
[mass:19|Phospho|<->:]PROTEOSFORMS
Some number of modifications completely unlocalized, e.g., by MS1 only.
The value of descriptors with the key "<->" can be any string, since the groupings are not important. (This colon is kind of ugly and addressed in proposal 3).
Proposal 2: Special prefixes and suffixes
This proposal places more emphasis on human readability for annotating localization ambiguity and less emphasis on continuing the key:value structure from the first version of ProForma.
Add four special strings to group ambiguity. These are followed or preceded by a unique string grouping the ambiguous localization sites.
"#" as a prefix
"->" as a suffix
"<-" as a prefix
"<->" alone
Examples:
PROT[Phospho|#eg]EOS[#eg]FORMS[#eg]
PROT[mass:19|A->]EOSFORMS[<-A]
[mass:19|Phospho|<->]PROTEOSFORMS
Note:
There are currently no Unimod entries that contain these special prefixes, suffixes, or standalone strings, but if one were introduced, it would cause a collision.
Proposal 3: New keys and one special string
This proposal is a compromise of the two former proposals, taking the key:value pair continuation from the first version of the proposal, but using the special string for annotating global modifications.
Add three new keys to specify ambiguity:
#, noting one of several sites that may be assigned a single modification
->, noting the left boundary of a range
<-, noting the right boundary of a range
Add one special string to specify unlocalized modifications:
<->, noting a modification has ambiguous localization along a whole proteoform sequence, used before the first amino acid of the sequence
The value of the key-value-pair is a unique string grouping the ambiguous localization sites
Examples:
PROT[Phospho|#:eg]EOS[#:eg]FORMS[#:eg]
PROT[mass:19|->:A]EOSFORMS[<-:A]
[mass:19|Phospho|<->]PROTEOSFORMS
Example 3 differs from Proposal 1 by dropping the colon character.
Maybe that was a bit much to start discussion. Here are the main questions:
Do you like using "#" and the arrows to specify ambiguity?
Do you want to keep building off of the key:value structure (easier to amend rules, Proposal 1 and 3), or do you want to use special prefixes and suffixes (easier human readability, Proposal 2)?
Overview
We should build a way to specify ambiguity of localization in ProForma:
Proposal 1: Four new keys
Add four new keys to specify ambiguity:
#
, noting one of several sites that may be assigned a single modification->
, noting the left boundary of a range of the sequence to which a modification may be localized<-
, noting the right boundary of such a range<->
, noting a modification has ambiguous localization along a whole proteoform sequence, used before the first amino acid of the sequenceThe value of the key-value-pair is a unique string grouping the ambiguous localization sites
Examples:
PROT[Phospho|#:eg]EOS[#:eg]FORMS[#:eg]
PROT[mass:19|->:A]EOSFORMS[<-:A]
A
values of the key-value pair groups the tags. This allows overlapping regions to be disambiguated.[mass:19|Phospho|<->:]PROTEOSFORMS
Proposal 2: Special prefixes and suffixes
This proposal places more emphasis on human readability for annotating localization ambiguity and less emphasis on continuing the key:value structure from the first version of ProForma.
Add four special strings to group ambiguity. These are followed or preceded by a unique string grouping the ambiguous localization sites.
Examples:
PROT[Phospho|#eg]EOS[#eg]FORMS[#eg]
PROT[mass:19|A->]EOSFORMS[<-A]
[mass:19|Phospho|<->]PROTEOSFORMS
Note: There are currently no Unimod entries that contain these special prefixes, suffixes, or standalone strings, but if one were introduced, it would cause a collision.
Proposal 3: New keys and one special string
This proposal is a compromise of the two former proposals, taking the key:value pair continuation from the first version of the proposal, but using the special string for annotating global modifications.
Add three new keys to specify ambiguity:
#
, noting one of several sites that may be assigned a single modification->
, noting the left boundary of a range<-
, noting the right boundary of a rangeAdd one special string to specify unlocalized modifications:
<->
, noting a modification has ambiguous localization along a whole proteoform sequence, used before the first amino acid of the sequenceThe value of the key-value-pair is a unique string grouping the ambiguous localization sites
Examples:
PROT[Phospho|#:eg]EOS[#:eg]FORMS[#:eg]
PROT[mass:19|->:A]EOSFORMS[<-:A]
[mass:19|Phospho|<->]PROTEOSFORMS
Example 3 differs from Proposal 1 by dropping the colon character.