Open ValWood opened 1 month ago
What do we have annotated as "residue extensions"
I cheated and looked in the GAF file rather than querying Chado. There are 15 annotations with a residue()
or modified_residue()
extension.
8 of those are protein binding annotations:
PomBase SPAC12B10.10 nod1 GO:0005515 PMID:23966468 IPI PomBase:SPAC31A2.16 F medial cortical node Gef2-related protein Nod1 protein taxon:4896 20141023 PomBase residue(957-1101)
PomBase SPAC31A2.16 gef2 GO:0005515 PMID:23966468 IPI PomBase:SPAC12B10.10 F RhoGEF Gef2 protein taxon:4896 20131105 PomBase residue(329-419)
PomBase SPAC6F6.16c tpz1 GO:0005515 PMID:24013504 IPI PomBase:SPAC19G12.13c F shelterin complex subunit Tpz1 SPAC6F6.18c|mug169 protein taxon:4896 20131112 PomBase residue(426-450)
PomBase SPAC6F6.16c tpz1 GO:0005515 PMID:24013504 IPI PomBase:SPAC26H5.06 F shelterin complex subunit Tpz1 SPAC6F6.18c|mug169 protein taxon:4896 20131112 PomBase residue(488-499)
PomBase SPBC27B12.02 mis19 GO:0005515 PMID:24774534 IPI PomBase:SPCC970.12 F kinetochore protein Mis19/Eic1 SPBC30B4.10|eic1|kis1 protein taxon:4896 20140605 PomBase residue(4-63)
PomBase SPBC27B12.02 mis19 GO:0005515 PMID:24774534 IPI PomBase:SPCC1672.10 F kinetochore protein Mis19/Eic1 SPBC30B4.10|eic1|kis1 protein taxon:4896 20140605 PomBase residue(53-112)
PomBase SPCC74.02c ppn1 GO:0005515 PMID:33711009 IPI PomBase:SPBC776.02c F mRNA cleavage and polyadenylation specificity factor complex associated protein (PNUTS) protein taxon:4896 20221007 PomBase residue(506639)
PomBase SPCC74.02c ppn1 GO:0005515 PMID:33711009 IPI PomBase:SPAC824.04 F mRNA cleavage and polyadenylation specificity factor complex associated protein (PNUTS) protein taxon:4896 20221007 PomBase residue(506639)
Here are the other 7:
PomBase SPAC22E12.09c krp1 GO:0004252 PMID:9418887 IMP F kexin krp protein taxon:4896 20111021 PomBase has_input(PomBase:SPAC22E12.09c),part_of(GO:0016485),residue(S371)
PomBase SPBC428.08c clr4 GO:0043130 PMID:34524082 EXP F histone lysine H3-K9 methyltransferase (Suv39) Clr4 protein taxon:4896 20211108 PomBase residue(243-261)
PomBase SPCC1672.06c asp1 GO:0016887 PMID:35536002 IDA F diphosphoinositol pentakisphosphate kinase/IP8 pyrophosphatase vip1 protein taxon:4896 20221012 PomBase residue(1-385)
PomBase SPCC1672.06c asp1 GO:0052723 PMID:35536002 IDA F diphosphoinositol pentakisphosphate kinase/IP8 pyrophosphatase vip1 protein taxon:4896 20221012 PomBase residue(1-385)
PomBase SPCC4B3.15 mid1 GO:0008289 PMID:15572668 EXP F anillin-related medial ring protein Mid1 dmf1 protein taxon:4896 20240412 PomBase residue(681-688)
PomBase SPNCRNA.530 sno530 GO:0030563 PMID:37403782 EXP F small nucleolar RNA sno530 sncRNA taxon:4896 20230710 PomBase has_input(PomBase:SPSNRNA.06),modified_residue(A64),part_of(GO:0016180)
PomBase SPSNORNA.25 snoZ30 GO:0030563 PMID:37403782 EXP F C/D containing snoRNA Z30 mgU6-47 snoRNA taxon:4896 20230710 PomBase has_input(PomBase:SPSNRNA.06),modified_residue(A41),part_of(GO:0016180)
I looked at the extensions for the protein-binding annotations, for most of them the residues indicated are on the annotated protein, which is what we want. It is only not the case for the nod1 and gef2 annotations which are reciprocally inverted, the residues indicated in the nod1 annotation are on gef2, and the residues in the gef2 annotation are on nod1. Although, with the way the annotation in worded, it might make more sense as it is now.
I think a user who reads this annotation (on nod1) as a sentence, would get it as nod1 binds to the site on residues 957-1101 of gef2, so in the correct way.
For the 6 others (asp1 is there twice) :
Add an extension column to the SO file so we can store residues (this is needed for all features in the "SO" file
That's done now. The columns in the file are:
"optional" mean that field can be blank, but still needs to be there surrounded by tabs. Except for qualifiers or extension which can be left off for convenience.
Note that there aren't any qualifiers in manual_so_term_annotations.tsv
so two tabs are needed between the date and extension values.
The current supported extension relation that makes sense here is residue
so the extension could be residue(1-24)
. We can add other extension relations.
krp1 Cleavage sites, we curate as protein features: https://www.pombase.org/term/SO:0100011 "cleaved peptide region" so the krp annotation should migrate to this, then it can be displayed in the feature viewer.
asp1 : the extension comes from the mapping of 2 enzymatic activities of asp1 to the N-terminal portion of asp1 based on in vitro assays using partial recombinant protein. Is is very relevant to capture this information in that way ? Maybe we should reflect it as phenotype annotations.
Agreed, we can remove these. I think this was an attempt to show which domain of the multifunctional protein had which activity, but people can figure this out from the other data (including the domain viewer)
sno530 and snoZ30 : the extension refers to modifications applied by these ncRNAs on snu6. Should this type of extensions be featured in the GO annotation for the activity and appear on the gene page for the modifier, or only as a modification on the target gene page ? At the moment these one don't appear anywhere on the snu6 page.
Well observed. There is no RNA modification ontology at the moment. We have a ticket to note them. https://github.com/pombase/curation/issues/129 We did consider doing this, but its a bit out of scope for us, and one of the RNA groups say they have one, it has just not appeared in the obo foundry. I will ask about this. SO, for the moment I think this one is all we can do.
We will fix the edge cases that we can, but hold off on migrating some protein features to manual_so_term_annotations.tsv until the extensions column is read.
Update: these will be read and displayed, but they won't yet show up in the protein feature viewer/
I have fixed the krp1 and asp1 annotations
[x] What do we have annotated as "residue extensions"
[ ] Curators review existing residues on catalytic activities- possibly remove? See https://github.com/pombase/curation/issues/3729
[x] Add an extension column to the SO file so we can store residues (this is needed for all features in the "SO" file
[ ] Store curated active sites in the SO file (so they appear in the protein features table) and display them in the protein feature viewer