Closed ValWood closed 1 month ago
how many of these are there?
About 1100 genes have one or more modified residue annotations. And there are 2636 annotations in total.
Here the counts of how many of each note for the 2636 annotations:
1 1-thioglycine
1 2,3-didehydroalanine (Cys)
12 N6-acetyllysine
14 N6-acetyllysine; alternate
16 N6,N6-dimethyllysine; alternate
1949 Phosphoserine
1 Diphthamide
1 Lysino-D-alanine (Lys); alternate
1 N6-(2-hydroxyisobutyryl)lysine
1 N6-glutaryllysine
1 N6-methyllysine; by autocatalysis; alternate
1 N6-methyllysine; by set13
1 N6,N6-dimethyllysine; by autocatalysis; alternate
1 N6,N6,N6-trimethyllysine; by autocatalysis; alternate
1 N6-(pyridoxal phosphate)lysine; alternate
1 Phosphoserine; by ATM
1 Phosphoserine; by autocatalysis
1 Phosphoserine; by CAK
1 Phosphoserine; by cdc2
1 Phosphoserine; by CHEK2
1 Phosphoserine; by CK2
1 Phosphoserine; by ksg1
1 Phosphoserine; by MAPK
1 Phosphoserine; by plo1
1 Phosphothreonine; by ATM
1 Phosphothreonine; by autocatalysis
1 Phosphothreonine; by ksg1
1 Phosphothreonine; by MAPK
1 Phosphothreonine; by PKB/AKT1
1 Phosphotyrosine; by autocatalysis
1 Pros-8alpha-FAD histidine
1 S-(dipyrrolylmethanemethyl)cysteine
1 S-glutathionyl cysteine
1 Tele-8alpha-FAD histidine
20 Cysteine methyl ester
2 2',4',5'-topaquinone
2 3,4-dihydroxyproline
24 N6-methyllysine; alternate
2 Hypusine
2 N5-methylarginine
2 N6-acetyl-N6-methyllysine; alternate
2 N6-biotinyllysine
2 N-acetylalanine; partial
2 N-acetylmethionine
2 Phosphohistidine
2 Phosphoserine; by TORC2
2 Phosphothreonine; by TORC1
352 Phosphothreonine
3 Leucine methyl ester
3 N6-acetyllysine; by autocatalysis
3 N6-carboxylysine
3 N,N,N-trimethylglycine
3 Phosphoserine; by CK1
40 N6-(pyridoxal phosphate)lysine
4 N5-methylglutamine
4 N6-lipoyllysine
4 N6,N6,N6-trimethyllysine
4 Phosphohistidine; by autocatalysis
4 Phosphoserine; by MAPK sty1
4 Phosphothreonine; by CDC2
4 Pyruvic acid (Ser); by autocatalysis
5 4-aspartylphosphate
63 Phosphotyrosine
7 N6-methyllysine
7 Phosphoserine; by CDC2
7 Phosphothreonine; by cdc2
8 N-acetylserine
8 O-(pantetheine 4'-phosphoryl)serine
9 N6,N6,N6-trimethyllysine; alternate
@Antonialock are these modifications from experiments or inferred? (I don't think I would infer phosphorsites, but my concern is that they use human/cerevisiae names in the "added by" maybe we should skip the phospho sites because we have pretty good EXP coverage for those
@kimrutherford do they have residues associated?
I guess the way forward
First, I think we need to extend our MOD data format file to include an "assigned by" column so that all of these have "assigned_by" UniProt
in the current annotation guidelines
so there should be an EXP associated with the annotation. Is there any non-overlap...?
Is there any non-overlap...?
We don't know yet. I checked the 8 annotations at the top of the file Kim provided (see first comment in this ticket), and we only had 6 of them. I suspect we will have most of the phosphosites, but we will filter any redundant ones.
@kimrutherford do they have residues associated?
Yep, they all have a position.
create a mapping to MOD
I had a look at the PSI-MOD OBO file to see how tricky that would be. It's not too bad because there are EXACT synonyms for most of the UniProt modification names. eg.
[Term]
id: MOD:00793
name: dehydroalanine (Cys)
...
synonym: "MOD_RES 2,3-didehydroalanine (Cys)" EXACT UniProt-feature []
There are 37 unique modification names we get from the UniProt file and all but 3 have a matching synonym. They are:
Phosphohistidine is likely to be this term (there is an EXACT synonym):
[Term]
id: MOD:00890
name: phosphorylated L-histidine
...
synonym: "phosphohistidine" EXACT PSI-MOD-alternate []
I'm unsure about 3,4-dihydroxyproline as there isn't an exact match. The closest is:
[Term]
id: MOD:01402
name: (2S,3R,4R)-3,4-dihydroxyproline
def: "A protein modification that effectively converts an L-proline residue to a (2S,3R,4R)-3,4-dihydroxyproline." [PubMed:6893271, RESID:AA0479, ChEBI:141805]
synonym: "(2S,3R,4R)-3,4-dihydroxyproline" EXACT RESID-name []
synonym: "(2S,3R,4R)-3,4-dihydroxypyrrolidine-2-carboxylic acid" EXACT RESID-systematic []
synonym: "2,3-trans-3,4-trans-3,4-dihydroxy-L-proline" EXACT RESID-alternate []
synonym: "2-alpha-3-beta-4-alpha-3,4-dihydroxyproline" EXACT RESID-alternate []
synonym: "MOD_RES (3R,4R)-3,4-dihydroxyproline" EXACT UniProt-feature []
I can't see anything in PSI-MOD that looks like "N6-acetyl-N6-methyllysine".
Wow KIm you got your curator badge!
correct for the first 2 "N6-acetyl-L-lysine" (MOD:00064) I didn't understand this one.
I asked ChatGPT and the second answer was 👍 how can both modifications occur on the same residue? ChatGPT said: ChatGPT You're right to question the possibility of both acetylation and methylation occurring on the same lysine residue, as these two modifications happen at the same nitrogen atom (N6) on the lysine side chain. Normally, the acetylation and methylation of lysine are mutually exclusive because both modifications target the ε-amino group (the N6 position) of the lysine side chain.
Here’s why this is the case:
Acetylation involves attaching an acetyl group (–COCH₃) to the ε-amino group of lysine, which neutralizes its positive charge and removes the free amine group. Methylation involves adding one, two, or three methyl groups (–CH₃) to the same ε-amino group. Methylation can happen one step at a time, forming mono-, di-, or trimethyllysine. Since both modifications target the same nitrogen atom, once lysine is acetylated, the nitrogen is no longer available to accept methyl groups, and vice versa.
Possible Scenarios for "N6-acetyl-N6-methyllysine": Sequential or Competitive Modification: In living cells, lysine residues can switch between acetylation and methylation. However, acetylation usually prevents methylation at the same site, and methylation prevents acetylation. It is possible that at different times, the lysine residue could first be methylated and then acetylated or the reverse, but not both modifications at once.
In Vitro or Synthetic Chemical Modification: In a controlled laboratory setting, chemists may create synthetic molecules where both modifications appear on the same lysine. These compounds can be useful to study the effects of such modifications on protein function, even though this does not naturally occur in cells.
Misnomer or Error: It could also be that the term "N6-acetyl-N6-methyllysine" is used imprecisely, referring to two distinct modification states (either acetylated or methylated lysine) without both modifications actually existing simultaneously on the same nitrogen atom.
In summary, while lysine can be either acetylated or methylated at the N6 position, both modifications cannot occur simultaneously on the same residue under normal biological conditions. The phrase "N6-acetyl-N6-methyllysine" might be more theoretical or used in contexts where sequential or competing modifications are considered.
I think we can ignore N6-acetyl-N6-methyllysine" for now. Which protein is it on?
Wow KIm you got your curator badge!
:-)
I think we can ignore N6-acetyl-N6-methyllysine" for now. Which protein is it on?
https://www.pombase.org/gene/SPAC1834.03c
K6: https://www.uniprot.org/uniprotkb/P09322/feature-viewer
It has a reference: https://pubmed.ncbi.nlm.nih.gov/37731000/
The changes to load the modifications from UniProt into Chado as PSI-MOD annotation are mostly done. I've still got a bit of testing and configuration to do so I won't commit the changes today.
Also I'd like to get these to issues finished and deployed at the same time:
For testing, I have a version with just the UniProt features and the PomBase curated modifications here: https://desktop.kmr.nz/reference/PMID:36408920
There is quite a bit of redundancy in the new modifications. You've curated most of the modifications already so we should think about adding some filtering.
There is quite a bit of redundancy in the new modifications. You've curated most of the modifications already so we should think about adding some filtering.
that's good to know!
I think we can close this now. The modifications are in Chado (so appear in the Modifications section on gene pages) and they are in the protein feature viewer.
From https://github.com/pombase/pombase-chado/issues/52#issuecomment-2330349791
Here is a sample of the "Modified residue" data:
SPAC144.13c; │ MOD_RES 62; /note="Phosphoserine"; /evidence="ECO:0000269|PubMed:10921878" SPBC428.11; │ MOD_RES 210; /note="N6-(pyridoxal phosphate)lysine"; /evidence="ECO:0000250|UniProtKB:P06721" SPAC22A12.07c; │ MOD_RES 451; /note="Phosphothreonine"; /evidence="ECO:0000269|PubMed:18257517" SPAC23C4.08; │ MOD_RES 202; /note="Cysteine methyl ester"; /evidence="ECO:0000250|UniProtKB:P62745" SPAC2F3.09; │ MOD_RES 377; /note="N6-(pyridoxal phosphate)lysine"; /evidence="ECO:0000250|UniProtKB:P18079" SPAC31G5.15; │ MOD_RES 912; /note="Pyruvic acid (Ser); by autocatalysis"; /evidence="ECO:0000255|HAMAP-Rule:MF_03209" SPBC428.02c; │ MOD_RES 256; /note="N6-(pyridoxal phosphate)lysine"; /evidence="ECO:0000250" SPAC10F6.09c; │ MOD_RES 105; /note="N6-acetyllysine"; /evidence="ECO:0000250"
SPAC23C4.08; │ MOD_RES 202; /note="Cysteine methyl ester"; /evidence="ECO:0000250|UniProtKB:P62745" SPAC10F6.09c; │ MOD_RES 105; /note="N6-acetyllysine"; /evidence="ECO:0000250"
If we can do a mapping for the terms we can add these (will decide once we have the numbers) we could only import ones for which our sequence matches UniProt.
We could also add an additional check to make sure the residues are sensible (this would be a useful QC check anyway). ( I.e phosphoseringe , only serine)