Closed jscottrell closed 6 years ago
jscottrell, we added these two items to the agenda for our next meeting, and will make the appropriate changes once we discuss them. Thank you!
We haven't really touched ambiguity in version 1.0. We discussed requirement for round one that when we call a proteoform, we know the amino acid sequence. We definitely got to tackle this at some point b/c almost noone ever knows the end to end sequence unequivocally. The amino acid encoding B and Z within uniprot have caused us numerous headaches in bottom up. I guess 'X' is also there. Here is uniprot table. Don't see J but I understand it.
6.1 Composition in percent for the complete database
Ala (A) 8.17 Gln (Q) 3.95 Leu (L) 9.67 Ser (S) 6.62
Arg (R) 5.50 Glu (E) 6.74 Lys (K) 5.87 Thr (T) 5.34
Asn (N) 4.07 Gly (G) 7.04 Met (M) 2.41 Trp (W) 1.09
Asp (D) 5.42 His (H) 2.28 Phe (F) 3.88 Tyr (Y) 2.93
Cys (C) 1.40 Ile (I) 5.94 Pro (P) 4.74 Val (V) 6.82
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
Although J isn't in the IUPAC standard, NCBI nr currently contains 35,630 J
Residue Frequency A 4192301557 B 51330 C 545892333 D 2478417080 E 2788235552 F 1731414549 G 3313686996 H 1003343003 I 2485386623 J 35630 K 2209498580 L 4482982267 M 1043588980 N 1732651536 O 261 P 2236526114 Q 1762729851 R 2608941046 S 3045137327 T 2527536348 U 14385 V 3108487093 W 586662197 X 10248364 Y 1305860474 Z 18233
John Cottrell Matrix Science Ltd. 64 Baker Street London W1U 7GB, UK Tel: +44 20 7486 1050 Fax: +44 20 7224 1344 jcottrell@matrixscience.com http://www.matrixscience.com
Matrix Science Ltd. is registered in England and Wales Company number 3533898
On 12/06/2017 21:36, trishorts wrote:
We haven't really touched ambiguity in version 1.0. We discussed requirement for round one that when we call a proteoform, we know the amino acid sequence. We definitely got to tackle this at some point b/c almost noone ever knows the end to end sequence unequivocally. The amino acid encoding B and Z within uniprot have caused us numerous headaches in bottom up. I guess 'X' is also there. Here is uniprot table. Don't see J but I understand it.
|6.1 Composition in percent for the complete database Ala (A) 8.17 Gln (Q) 3.95 Leu (L) 9.67 Ser (S) 6.62 Arg (R) 5.50 Glu (E) 6.74 Lys (K) 5.87 Thr (T) 5.34 Asn (N) 4.07 Gly (G) 7.04 Met (M) 2.41 Trp (W) 1.09 Asp (D) 5.42 His (H) 2.28 Phe (F) 3.88 Tyr (Y) 2.93 Cys (C) 1.40 Ile (I) 5.94 Pro (P) 4.74 Val (V) 6.82 Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00 |
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/topdownproteomics/proteoform-nomenclature-standard/issues/13#issuecomment-307916480, or mute the thread https://github.com/notifications/unsubscribe-auth/AcBCfugsDhQEPqj2hap48kAtSbBmGbYFks5sDaFSgaJpZM4N3NmD.
Hi all,
I've been wondering whether this could be solved by using preceeding tags, as introduced in Rule 6. In this case one could define a certain tag that specifies one of the remaining single letters, "J is ambiguous for I and L". However, I'm not sure if this is what you have in mind in regard of simplicity.
moved discussion of rule 3 to new issue
Ambiguity is not currently part of the standard, but I believe it is a good idea to transparently allow AA-level ambiguity without compromising the standard and without introducing new notation. For ambiguous AA, see https://en.wikipedia.org/wiki/Proteinogenic_amino_acid
B: Asparagine or aspartic acid J: Leucine or isoleucine X: Unknown Z: Glutamic acid or glutamine
Thank you for this discussion, all. I am going to close this issue, since it was addressed in Rule 1 of the ProForma standard (published here).
Namely, we allowed J, B, and Z to be used. We also allowed U to note selenocysteine and O to note pyrrolysine. We forbade X because it is used for undetermined amino acids, where ProForma is intended to annotate nearly/fully characterized proteoforms.
Section 3 Rule 1: Maybe J should be allowed since I/L ambiguity is expected in any sequence characterised solely by MS.
Section 3 Rule 5: (Also Section 5) The Unimod 'PSI-MS Name' is the preferred name. The 'Interim name' should only be used when the PSI-MS Name is empty.