pombase / pombase-chado

PomBase code for accessing Chado
MIT License
5 stars 3 forks source link

Annotated Residues as GO MF extensions, and propagating to other places #1194

Open ValWood opened 1 month ago

ValWood commented 1 month ago
kimrutherford commented 1 month ago

What do we have annotated as "residue extensions"

I cheated and looked in the GAF file rather than querying Chado. There are 15 annotations with a residue() or modified_residue() extension.

8 of those are protein binding annotations:

PomBase SPAC12B10.10    nod1            GO:0005515      PMID:23966468   IPI     PomBase:SPAC31A2.16     F       medial cortical node Gef2-related protein Nod1            protein taxon:4896      20141023        PomBase residue(957-1101)
PomBase SPAC31A2.16     gef2            GO:0005515      PMID:23966468   IPI     PomBase:SPAC12B10.10    F       RhoGEF Gef2             protein taxon:4896        20131105        PomBase residue(329-419)
PomBase SPAC6F6.16c     tpz1            GO:0005515      PMID:24013504   IPI     PomBase:SPAC19G12.13c   F       shelterin complex subunit Tpz1  SPAC6F6.18c|mug169        protein taxon:4896      20131112        PomBase residue(426-450)
PomBase SPAC6F6.16c     tpz1            GO:0005515      PMID:24013504   IPI     PomBase:SPAC26H5.06     F       shelterin complex subunit Tpz1  SPAC6F6.18c|mug169        protein taxon:4896      20131112        PomBase residue(488-499)
PomBase SPBC27B12.02    mis19           GO:0005515      PMID:24774534   IPI     PomBase:SPCC970.12      F       kinetochore protein Mis19/Eic1  SPBC30B4.10|eic1|kis1     protein taxon:4896      20140605        PomBase residue(4-63)
PomBase SPBC27B12.02    mis19           GO:0005515      PMID:24774534   IPI     PomBase:SPCC1672.10     F       kinetochore protein Mis19/Eic1  SPBC30B4.10|eic1|kis1     protein taxon:4896      20140605        PomBase residue(53-112)
PomBase SPCC74.02c      ppn1            GO:0005515      PMID:33711009   IPI     PomBase:SPBC776.02c     F       mRNA cleavage and polyadenylation specificity factor complex associated protein (PNUTS)           protein taxon:4896      20221007        PomBase residue(506639)
PomBase SPCC74.02c      ppn1            GO:0005515      PMID:33711009   IPI     PomBase:SPAC824.04      F       mRNA cleavage and polyadenylation specificity factor complex associated protein (PNUTS)           protein taxon:4896      20221007        PomBase residue(506639)

Here are the other 7:

PomBase SPAC22E12.09c   krp1            GO:0004252      PMID:9418887    IMP             F       kexin   krp     protein taxon:4896      20111021 PomBase  has_input(PomBase:SPAC22E12.09c),part_of(GO:0016485),residue(S371)
PomBase SPBC428.08c     clr4            GO:0043130      PMID:34524082   EXP             F       histone lysine H3-K9 methyltransferase (Suv39) Clr4               protein taxon:4896      20211108        PomBase residue(243-261)
PomBase SPCC1672.06c    asp1            GO:0016887      PMID:35536002   IDA             F       diphosphoinositol pentakisphosphate kinase/IP8 pyrophosphatase    vip1    protein taxon:4896      20221012        PomBase residue(1-385)
PomBase SPCC1672.06c    asp1            GO:0052723      PMID:35536002   IDA             F       diphosphoinositol pentakisphosphate kinase/IP8 pyrophosphatase    vip1    protein taxon:4896      20221012        PomBase residue(1-385)
PomBase SPCC4B3.15      mid1            GO:0008289      PMID:15572668   EXP             F       anillin-related medial ring protein Mid1        dmf1      protein taxon:4896      20240412        PomBase residue(681-688)
PomBase SPNCRNA.530     sno530          GO:0030563      PMID:37403782   EXP             F       small nucleolar RNA sno530              sncRNA  taxon:4896        20230710        PomBase has_input(PomBase:SPSNRNA.06),modified_residue(A64),part_of(GO:0016180)
PomBase SPSNORNA.25     snoZ30          GO:0030563      PMID:37403782   EXP             F       C/D containing snoRNA Z30       mgU6-47 snoRNA  taxon:4896        20230710        PomBase has_input(PomBase:SPSNRNA.06),modified_residue(A41),part_of(GO:0016180)
PCarme commented 1 month ago

I looked at the extensions for the protein-binding annotations, for most of them the residues indicated are on the annotated protein, which is what we want. It is only not the case for the nod1 and gef2 annotations which are reciprocally inverted, the residues indicated in the nod1 annotation are on gef2, and the residues in the gef2 annotation are on nod1. Although, with the way the annotation in worded, it might make more sense as it is now.

image

I think a user who reads this annotation (on nod1) as a sentence, would get it as nod1 binds to the site on residues 957-1101 of gef2, so in the correct way.

PCarme commented 1 month ago

For the 6 others (asp1 is there twice) :

kimrutherford commented 1 month ago

Add an extension column to the SO file so we can store residues (this is needed for all features in the "SO" file

That's done now. The columns in the file are:

"optional" mean that field can be blank, but still needs to be there surrounded by tabs. Except for qualifiers or extension which can be left off for convenience.

Note that there aren't any qualifiers in manual_so_term_annotations.tsv so two tabs are needed between the date and extension values.

The current supported extension relation that makes sense here is residue so the extension could be residue(1-24). We can add other extension relations.

ValWood commented 1 month ago

krp1 Cleavage sites, we curate as protein features: https://www.pombase.org/term/SO:0100011 "cleaved peptide region" so the krp annotation should migrate to this, then it can be displayed in the feature viewer.

ValWood commented 1 month ago

asp1 : the extension comes from the mapping of 2 enzymatic activities of asp1 to the N-terminal portion of asp1 based on in vitro assays using partial recombinant protein. Is is very relevant to capture this information in that way ? Maybe we should reflect it as phenotype annotations.

Agreed, we can remove these. I think this was an attempt to show which domain of the multifunctional protein had which activity, but people can figure this out from the other data (including the domain viewer)

ValWood commented 1 month ago

sno530 and snoZ30 : the extension refers to modifications applied by these ncRNAs on snu6. Should this type of extensions be featured in the GO annotation for the activity and appear on the gene page for the modifier, or only as a modification on the target gene page ? At the moment these one don't appear anywhere on the snu6 page.

Well observed. There is no RNA modification ontology at the moment. We have a ticket to note them. https://github.com/pombase/curation/issues/129 We did consider doing this, but its a bit out of scope for us, and one of the RNA groups say they have one, it has just not appeared in the obo foundry. I will ask about this. SO, for the moment I think this one is all we can do.

ValWood commented 1 month ago

We will fix the edge cases that we can, but hold off on migrating some protein features to manual_so_term_annotations.tsv until the extensions column is read.

Update: these will be read and displayed, but they won't yet show up in the protein feature viewer/

PCarme commented 1 month ago

I have fixed the krp1 and asp1 annotations