pombase / pombase-chado

PomBase code for accessing Chado
MIT License
5 stars 3 forks source link

Possible modifications to import #1216

Closed ValWood closed 1 month ago

ValWood commented 2 months ago

From https://github.com/pombase/pombase-chado/issues/52#issuecomment-2330349791

Here is a sample of the "Modified residue" data:

SPAC144.13c; │ MOD_RES 62; /note="Phosphoserine"; /evidence="ECO:0000269|PubMed:10921878" SPBC428.11; │ MOD_RES 210; /note="N6-(pyridoxal phosphate)lysine"; /evidence="ECO:0000250|UniProtKB:P06721" SPAC22A12.07c; │ MOD_RES 451; /note="Phosphothreonine"; /evidence="ECO:0000269|PubMed:18257517" SPAC23C4.08; │ MOD_RES 202; /note="Cysteine methyl ester"; /evidence="ECO:0000250|UniProtKB:P62745" SPAC2F3.09; │ MOD_RES 377; /note="N6-(pyridoxal phosphate)lysine"; /evidence="ECO:0000250|UniProtKB:P18079" SPAC31G5.15; │ MOD_RES 912; /note="Pyruvic acid (Ser); by autocatalysis"; /evidence="ECO:0000255|HAMAP-Rule:MF_03209" SPBC428.02c; │ MOD_RES 256; /note="N6-(pyridoxal phosphate)lysine"; /evidence="ECO:0000250" SPAC10F6.09c; │ MOD_RES 105; /note="N6-acetyllysine"; /evidence="ECO:0000250"

  1. @kimrutherford how many of these are there? They are a useful source. I checked this set and we only had

SPAC23C4.08; │ MOD_RES 202; /note="Cysteine methyl ester"; /evidence="ECO:0000250|UniProtKB:P62745" SPAC10F6.09c; │ MOD_RES 105; /note="N6-acetyllysine"; /evidence="ECO:0000250"

  1. If we can do a mapping for the terms we can add these (will decide once we have the numbers) we could only import ones for which our sequence matches UniProt.

  2. We could also add an additional check to make sure the residues are sensible (this would be a useful QC check anyway). ( I.e phosphoseringe , only serine)

kimrutherford commented 1 month ago

how many of these are there?

About 1100 genes have one or more modified residue annotations. And there are 2636 annotations in total.

Here the counts of how many of each note for the 2636 annotations:

      1 1-thioglycine
      1 2,3-didehydroalanine (Cys)
     12 N6-acetyllysine
     14 N6-acetyllysine; alternate
     16 N6,N6-dimethyllysine; alternate
   1949 Phosphoserine
      1 Diphthamide
      1 Lysino-D-alanine (Lys); alternate
      1 N6-(2-hydroxyisobutyryl)lysine
      1 N6-glutaryllysine
      1 N6-methyllysine; by autocatalysis; alternate
      1 N6-methyllysine; by set13
      1 N6,N6-dimethyllysine; by autocatalysis; alternate
      1 N6,N6,N6-trimethyllysine; by autocatalysis; alternate
      1 N6-(pyridoxal phosphate)lysine; alternate
      1 Phosphoserine; by ATM
      1 Phosphoserine; by autocatalysis
      1 Phosphoserine; by CAK
      1 Phosphoserine; by cdc2
      1 Phosphoserine; by CHEK2
      1 Phosphoserine; by CK2
      1 Phosphoserine; by ksg1
      1 Phosphoserine; by MAPK
      1 Phosphoserine; by plo1
      1 Phosphothreonine; by ATM
      1 Phosphothreonine; by autocatalysis
      1 Phosphothreonine; by ksg1
      1 Phosphothreonine; by MAPK
      1 Phosphothreonine; by PKB/AKT1
      1 Phosphotyrosine; by autocatalysis
      1 Pros-8alpha-FAD histidine
      1 S-(dipyrrolylmethanemethyl)cysteine
      1 S-glutathionyl cysteine
      1 Tele-8alpha-FAD histidine
     20 Cysteine methyl ester
      2 2',4',5'-topaquinone
      2 3,4-dihydroxyproline
     24 N6-methyllysine; alternate
      2 Hypusine
      2 N5-methylarginine
      2 N6-acetyl-N6-methyllysine; alternate
      2 N6-biotinyllysine
      2 N-acetylalanine; partial
      2 N-acetylmethionine
      2 Phosphohistidine
      2 Phosphoserine; by TORC2
      2 Phosphothreonine; by TORC1
    352 Phosphothreonine
      3 Leucine methyl ester
      3 N6-acetyllysine; by autocatalysis
      3 N6-carboxylysine
      3 N,N,N-trimethylglycine
      3 Phosphoserine; by CK1
     40 N6-(pyridoxal phosphate)lysine
      4 N5-methylglutamine
      4 N6-lipoyllysine
      4 N6,N6,N6-trimethyllysine
      4 Phosphohistidine; by autocatalysis
      4 Phosphoserine; by MAPK sty1
      4 Phosphothreonine; by CDC2
      4 Pyruvic acid (Ser); by autocatalysis
      5 4-aspartylphosphate
     63 Phosphotyrosine
      7 N6-methyllysine
      7 Phosphoserine; by CDC2
      7 Phosphothreonine; by cdc2
      8 N-acetylserine
      8 O-(pantetheine 4'-phosphoryl)serine
      9 N6,N6,N6-trimethyllysine; alternate
ValWood commented 1 month ago

@Antonialock are these modifications from experiments or inferred? (I don't think I would infer phosphorsites, but my concern is that they use human/cerevisiae names in the "added by" maybe we should skip the phospho sites because we have pretty good EXP coverage for those

@kimrutherford do they have residues associated?

ValWood commented 1 month ago

I guess the way forward

  1. create a mapping to MOD
  2. create the annotation file and filter the ones we already have
  3. spot check what is left to see if there are any we would like to eliminate

First, I think we need to extend our MOD data format file to include an "assigned by" column so that all of these have "assigned_by" UniProt

Antonialock commented 1 month ago

in the current annotation guidelines

  1. phosphosites are not inferred from cerevisiae (I don't think they ever were)
  2. we should use the pombe kinase gene name in the ; by field (it looks like this has not always been the case)
Antonialock commented 1 month ago

so there should be an EXP associated with the annotation. Is there any non-overlap...?

ValWood commented 1 month ago

Is there any non-overlap...?

We don't know yet. I checked the 8 annotations at the top of the file Kim provided (see first comment in this ticket), and we only had 6 of them. I suspect we will have most of the phosphosites, but we will filter any redundant ones.

kimrutherford commented 1 month ago

@kimrutherford do they have residues associated?

Yep, they all have a position.

kimrutherford commented 1 month ago

create a mapping to MOD

I had a look at the PSI-MOD OBO file to see how tricky that would be. It's not too bad because there are EXACT synonyms for most of the UniProt modification names. eg.

[Term]
id: MOD:00793
name: dehydroalanine (Cys)
...
synonym: "MOD_RES 2,3-didehydroalanine (Cys)" EXACT UniProt-feature []

There are 37 unique modification names we get from the UniProt file and all but 3 have a matching synonym. They are:

Phosphohistidine is likely to be this term (there is an EXACT synonym):

[Term]
id: MOD:00890
name: phosphorylated L-histidine
...
synonym: "phosphohistidine" EXACT PSI-MOD-alternate []

I'm unsure about 3,4-dihydroxyproline as there isn't an exact match. The closest is:

[Term]
id: MOD:01402
name: (2S,3R,4R)-3,4-dihydroxyproline
def: "A protein modification that effectively converts an L-proline residue to a (2S,3R,4R)-3,4-dihydroxyproline." [PubMed:6893271, RESID:AA0479, ChEBI:141805]
synonym: "(2S,3R,4R)-3,4-dihydroxyproline" EXACT RESID-name []
synonym: "(2S,3R,4R)-3,4-dihydroxypyrrolidine-2-carboxylic acid" EXACT RESID-systematic []
synonym: "2,3-trans-3,4-trans-3,4-dihydroxy-L-proline" EXACT RESID-alternate []
synonym: "2-alpha-3-beta-4-alpha-3,4-dihydroxyproline" EXACT RESID-alternate []
synonym: "MOD_RES (3R,4R)-3,4-dihydroxyproline" EXACT UniProt-feature []

I can't see anything in PSI-MOD that looks like "N6-acetyl-N6-methyllysine".

ValWood commented 1 month ago

Wow KIm you got your curator badge!

correct for the first 2 "N6-acetyl-L-lysine" (MOD:00064) I didn't understand this one.

I asked ChatGPT and the second answer was 👍 how can both modifications occur on the same residue? ChatGPT said: ChatGPT You're right to question the possibility of both acetylation and methylation occurring on the same lysine residue, as these two modifications happen at the same nitrogen atom (N6) on the lysine side chain. Normally, the acetylation and methylation of lysine are mutually exclusive because both modifications target the ε-amino group (the N6 position) of the lysine side chain.

Here’s why this is the case:

Acetylation involves attaching an acetyl group (–COCH₃) to the ε-amino group of lysine, which neutralizes its positive charge and removes the free amine group. Methylation involves adding one, two, or three methyl groups (–CH₃) to the same ε-amino group. Methylation can happen one step at a time, forming mono-, di-, or trimethyllysine. Since both modifications target the same nitrogen atom, once lysine is acetylated, the nitrogen is no longer available to accept methyl groups, and vice versa.

Possible Scenarios for "N6-acetyl-N6-methyllysine": Sequential or Competitive Modification: In living cells, lysine residues can switch between acetylation and methylation. However, acetylation usually prevents methylation at the same site, and methylation prevents acetylation. It is possible that at different times, the lysine residue could first be methylated and then acetylated or the reverse, but not both modifications at once.

In Vitro or Synthetic Chemical Modification: In a controlled laboratory setting, chemists may create synthetic molecules where both modifications appear on the same lysine. These compounds can be useful to study the effects of such modifications on protein function, even though this does not naturally occur in cells.

Misnomer or Error: It could also be that the term "N6-acetyl-N6-methyllysine" is used imprecisely, referring to two distinct modification states (either acetylated or methylated lysine) without both modifications actually existing simultaneously on the same nitrogen atom.

In summary, while lysine can be either acetylated or methylated at the N6 position, both modifications cannot occur simultaneously on the same residue under normal biological conditions. The phrase "N6-acetyl-N6-methyllysine" might be more theoretical or used in contexts where sequential or competing modifications are considered.

ValWood commented 1 month ago

I think we can ignore N6-acetyl-N6-methyllysine" for now. Which protein is it on?

kimrutherford commented 1 month ago

Wow KIm you got your curator badge!

:-)

I think we can ignore N6-acetyl-N6-methyllysine" for now. Which protein is it on?

https://www.pombase.org/gene/SPAC1834.03c

K6: https://www.uniprot.org/uniprotkb/P09322/feature-viewer

It has a reference: https://pubmed.ncbi.nlm.nih.gov/37731000/

kimrutherford commented 1 month ago

The changes to load the modifications from UniProt into Chado as PSI-MOD annotation are mostly done. I've still got a bit of testing and configuration to do so I won't commit the changes today.

Also I'd like to get these to issues finished and deployed at the same time:

For testing, I have a version with just the UniProt features and the PomBase curated modifications here: https://desktop.kmr.nz/reference/PMID:36408920

There is quite a bit of redundancy in the new modifications. You've curated most of the modifications already so we should think about adding some filtering.

ValWood commented 1 month ago

There is quite a bit of redundancy in the new modifications. You've curated most of the modifications already so we should think about adding some filtering.

that's good to know!

kimrutherford commented 1 month ago

I think we can close this now. The modifications are in Chado (so appear in the Modifications section on gene pages) and they are in the protein feature viewer.