pombase / curation

PomBase curation
7 stars 0 forks source link

C-term/N-term annotation is inconsistent #3733

Closed kimrutherford closed 2 weeks ago

kimrutherford commented 3 weeks ago

Hi Val.

I've found that there are three ways that fusions are annotated. For implementing pombase/pombase-chado#993 it would be helpful to standardise.

SPAC1782.04/cox24 is like this with "(C-term)" / "(N-term)" after the gene ID:

FT                   /controlled_curation="term=orthologous to S. cerevisiae
FT                   YLR204W (C-term); date=19700101"
FT                   /controlled_curation="term=orthologous to S. cerevisiae
FT                   YNL295W (N-term); date=20120912"

these aren't displayed on the gene pages at the moment.

his2 and his7 has the C-term/N-term in brackets in the qualifier:

his2 / SPBC1711.13:

FT                   /controlled_curation="term=orthologous to S. cerevisiae
FT                   YCL030C; qualifier=YCL030C(C-term); date=19700101"

his7 / SPBC29A3.02c:

FT                   /controlled_curation="term=orthologous to S. cerevisiae
FT                   YCL030C; qualifier=YCL030C(N-term); date=19700101"

In all other cases the C-term/N-term is after a comma in the qualifier:

qualifier=SPAC19B12.13,C-term
kimrutherford commented 3 weeks ago

In all other cases the C-term/N-term is after a comma in the qualifier: qualifier=SPAC19B12.13,C-term

I've just noticed that some have an underscore and some have a "-".

kimrutherford commented 3 weeks ago

There are some that are missing the gene name in the qualifier:

 SPAC13F5.04c  │ VTA1      │ cerevisiae │ N_term
 SPBC3H7.11    │ ABP140    │ cerevisiae │ N_term
 SPAC22E12.10c │ COX15     │ sapiens    │ N-term
 SPBC21C3.07c  │ ABP140    │ cerevisiae │ C_term

"HGMP" looks like a typo here:

 SPBC405.01    │ GART      │ sapiens    │ HGMP:GART,N-term
 SPCC569.08c   │ GART      │ sapiens    │ HGMP:GART,C-term
kimrutherford commented 3 weeks ago

For completeness, here are all the orthologs and qualifiers from the contig files:

pombe other species qualifier
SPBP4H10.15 ACO2 cerevisiae SPBP4H10.15,N-term
SPBP4H10.15 MRPL49 cerevisiae SPBP4H10.15,C-term
SPBC530.12c CAX4 cerevisiae SPBC530.12c,C-term
SPBC16A3.11 RAD30 cerevisiae SPBC16A3.11,N-term
SPBC16A3.11 ECO1 cerevisiae SPBC16A3.11,C-term
SPAC806.02c CFD1 cerevisiae SPAC806.02c,N-term
SPAC806.02c CIA1 cerevisiae SPAC806.02c,C-term
SPAC6F12.05c YJR142W cerevisiae SPAC6F12.05c,N-term
SPAC6F12.05c THI80 cerevisiae SPAC6F12.05c,C-term
SPAC2C4.12c TPT1 cerevisiae SPAC2C4.12c,N-term
SPAC2C4.12c YAE1 cerevisiae SPAC2C4.12c,C-term
SPAC22E12.10c COX15 cerevisiae SPAC22E12.10c,N-term
SPAC22E12.10c YAH1 cerevisiae SPAC22E12.10c,C-term
SPAC22A12.08c YKR070W cerevisiae SPAC22A12.08c,N_term
SPAC22A12.08c CRD1 cerevisiae SPAC22A12.08c,C_term
SPAC19B12.13 RSM22 cerevisiae SPAC19B12.13,N-term
SPAC19B12.13 COX11 cerevisiae SPAC19B12.13,C-term
SPAC15E1.04 CDC21 cerevisiae SPAC15E1.04,C-term
SPAC1420.04c RSM22 cerevisiae SPAC1420.04c,N-term
SPAC1420.04c COX11 cerevisiae SPAC1420.04c,C-term
SPAC13F5.04c VTA1 cerevisiae N_term
SPBC3H7.11 ABP140 cerevisiae N_term
SPBC21C3.07c ABP140 cerevisiae C_term
SPBC530.12c PPT1 sapiens SPBC530.12c,N-term
SPAC806.02c NUBP2 sapiens SPAC806.02c,N-term
SPAC806.02c CIAO1 sapiens SPAC806.02c,C-term
SPAC22E12.10c COX15 sapiens N-term
SPBC405.01 GART sapiens HGMP:GART,N-term
SPCC569.08c GART sapiens HGMP:GART,C-term
ValWood commented 3 weeks ago

from https://github.com/pombase/curation/issues/3455

At the same time, some of the fusions do not have their human orthologs. Do these too

Annotate human orthologs similarly to S. cerevisiaie (N-ter, C-Term)

Systematic ID Gene name Product description SPAC22A12.08c crd1 cardiolipin synthase/ hydrolase fusion protein Crd1 SPAC806.02c CIA machinery CIA1/CFD1 fusion protein SPAC1420.04c cox1101 cytochrome c oxidase assembly protein Cox1101/ mitochondrial ribosomal protein Rsm22 fusion protein SPAC19B12.13 cox1102 cytochrome c oxidase assembly protein Cox1102/ mitochondrial ribosomal protein Rsm2202, fusion protein SPCC1223.08c dfr1 dihydrofolate reductase/ lysophospholipase fusion protein Dfr1 SPAC22E12.10c etp1 mitochondrial [2Fe-2S] cluster assembly ferredoxin Etp1/ cytochrome oxidase cofactor Cox15, fusion protein SPAC1782.04 cox24 mitochondrial mRNA processing protein Cox24/Pet20 SPBP4H10.15 aco2 mitochondrial ribosomal protein subunit L21/aconitate hydratase, fusion protein SPBC2D10.09 snr1 mitochondrial ribosomal protein subunit S47/3-hydroxyisobutyryl-CoA hydrolase Snr1 SPBC16A3.11 eso1 mitotic cohesin N-acetyltransferase/DNA polymerase eta Eso1 fusion protein SPBC530.12c pdf1 palmitoyl protein thioesterase/ dolichol pyrophosphate phosphatase fusion protein Pdf1 SPCC1450.15 pig-F/3-ketosphinganine reductase fusion protein SPAC6F12.05c tnr3 thiamine diphosphokinase Tnr3/ Nudix hydrolase fusion protein SPAC15E1.04 hal3 thymidylate synthase / phosphopantothenoylcysteine decarboxylase / protein phosphatase inhibitor moonlighting protein Hal3 SPBC13E7.02 cwf24 ubiquitin-protein ligase E3/GCN5-related N acetyltransferase fusion protein SPCC1442.07c wss2 ubiquitin/metalloprotease fusion protein Udp7

METTL17 for cox11 https://www.biorxiv.org/content/10.1101/2022.11.24.517765v1

ValWood commented 2 weeks ago

reviewed and added missing human to match the S. cerevisiae, and a couple of small fixes to the cerevisiae ones

uniquename uniquename qualifier reference date

SPAC1420.04c COX11 SPAC19B12.13,C-term 2024-08-29 SPAC1420.04c METTL17 SPAC19B12.13,N-term 2024-08-29 SPAC19B12.13 COX11 SPAC19B12.13,C-term 2024-08-29 SPAC19B12.13 METTL17 SPAC19B12.13,N-term 2024-08-29 SPAC22A12.08c HDHD5 SPAC22A12.08c,N-term 2024-08-29 SPAC22A12.08c CRLS1 SPAC22A12.08c,C-term 2024-08-29 SPAC22E12.10c FDX2 SPAC22E12.10c,C-term 2024-08-29 SPAC2C4.12c TRPT1 SPAC2C4.12c,N-term 2024-08-29 SPAC2C4.12c YEA1 SPAC2C4.12c,C-term 2024-08-29
SPAC6F12.05c TPK1 SPAC6F12.05c,C-term 2024-08-29 SPAC6F12.05c NUDT19 SPAC6F12.05c,N-term 2024-08-29 SPBC16A3.11 ESCO1 SPBC16A3.11,C-term 2024-08-29
SPBC16A3.11 POLH SPBC16A3.11,N-term 2024-08-29
SPBC530.12c DOLPP1 SPBC530.12c,C-term 2024-08-29 SPBP4H10.15 ACO2 SPBP4H10.15,C-term 2024-08-29

ValWood commented 2 weeks ago

So happy to do this it's been on my mind since before you went back to New Zealand. I think I'll treat my self to a glass of wine for this one......much easier in the flat file!