pombase / pombase-chado

PomBase code for accessing Chado
MIT License
5 stars 3 forks source link

Should we be sending annotation for pseudogenes to GO? #1015

Closed kimrutherford closed 1 year ago

kimrutherford commented 2 years ago

I've just noticed that we don't include annotation for pseudogenes in the GPAD file we created for GO. The pseudogenes are in the GPI file though.

I don't know why I implemented that way. I thought I'd check before fixing it: should we be sending annotation for pseudogenes to GO in the GPAD file?

(The GAF file includes the pseudogene annotation)

ValWood commented 2 years ago

Thee should not be any annotation on pseudogenes. Which ones have them? I will take a look...

ValWood commented 2 years ago

I filtered IEAs from [IPR004982], that might get rid of most. I'll check tomorrow what is left. Ine might need to change status and have pseudo removed, but I might need to ask about that.

ValWood commented 2 years ago

(it's annotation on wtfs, but some are not functional as meiotic drivers)

kimrutherford commented 2 years ago

Yep, all the annotated pseudogenes are WTFs.

https://www.pombase.org/results/from/id/72854300-1903-4e1c-b6bd-5c9cbca1fa0b

ValWood commented 2 years ago

I think I have removed all of our annotations to pseudognes. Can we block imported annotation on pseudogenes (not urgent)

ValWood commented 2 years ago

@kimrutherford are there currently any GO annotation on pseudogenes?

kimrutherford commented 2 years ago

are there currently any GO annotation on pseudogenes?

7 pseudogenes have GO annotation: https://www.pombase.org/results/from/id/862dfa78-bbfb-4cb4-a3c5-835839c01339

kimrutherford commented 2 years ago

7 pseudogenes have GO annotation:

Here are the annotations:

PomBase SPBC1706.02c    wtf2            GO:0016021      GO_REF:0000043  IEA     UniProtKB-KW:KW-0812    C       wtf element (with segmental deletion) Wtf2      SPBC1706.02     pseudogenic_transcript  taxon:4896      20220718        UniProt
PomBase SPCC285.06c     wtf17           GO:0016021      GO_REF:0000043  IEA     UniProtKB-KW:KW-0812    C       wtf element Wtf17 (no initiator methionine)             pseudogenic_transcript  taxon:4896      20220718        UniProt
PomBase SPCC306.10      wtf8            GO:0005737      PMID:16823372   HDA             C       wtf element Wtf8 (frameshifted)         pseudogenic_transcript  taxon:4896      20091111        PomBase
PomBase SPCC306.10      wtf8            GO:0000324      PMID:16823372   HDA             C       wtf element Wtf8 (frameshifted)         pseudogenic_transcript  taxon:4896      20091111        PomBase
PomBase SPCC306.10      wtf8            GO:0016021      GO_REF:0000043  IEA     UniProtKB-KW:KW-0812    C       wtf element Wtf8 (frameshifted)         pseudogenic_transcript  taxon:4896      20220718        UniProt
PomBase SPCC553.05c     wtf6            GO:0016021      GO_REF:0000043  IEA     UniProtKB-KW:KW-0812    C       wtf element Wtf6 (frameshifted) SPCC553.05      pseudogenic_transcript  taxon:4896      20220718        UniProt
PomBase SPCC576.16c     wtf22           GO:0016021      GO_REF:0000043  IEA     UniProtKB-KW:KW-0812    C       wtf element Wtf22 (frameshifted)                pseudogenic_transcript  taxon:4896      20220718        UniProt
PomBase SPCC622.21      wtf12           GO:0016021      GO_REF:0000043  IEA     UniProtKB-KW:KW-0812    C       wtf element Wtf12 (truncated)           pseudogenic_transcript  taxon:4896      20220718        UniProt
PomBase SPCC622.21      wtf12           GO:0016020      GO_REF:0000051  NAS             C       wtf element Wtf12 (truncated)           pseudogenic_transcript  taxon:4896      20090126        PomBase
PomBase SPCC830.02      wtf24           GO:0016021      GO_REF:0000043  IEA     UniProtKB-KW:KW-0812    C       wtf element Wtf24 (frameshifted)                pseudogenic_transcript  taxon:4896      20220718        UniProt
ValWood commented 2 years ago

I removed the HDA to the legacy file to external_data/external-go-data/GO_ORFeome_localizations2_deleted.txt and the GO_REF:0000051 NAS from go_comp.txt

@kimrutherford can you filter any imported GO annotations to pseudogenes from the GOA file on import (I can't filter the UniProtKB-KW:KW-0812 because it is used for a lot of correct "integral to membrane" assignemnts

ValWood commented 2 years ago

Rereading the original issue, you don't need to export annotations to pseudognes, but this should also prevent us accumulating. any.

A log file of "GO annotation to pseudogenes" might be useful.

kimrutherford commented 1 year ago

I've done a quick check. We don't have any annotations to pseudogenes now.

GOA have a few annotations for wtf12: https://www.uniprot.org/uniprotkb/Q8NIP8/entry

Those GOA annotation aren't being loaded: https://www.pombase.org/gene/SPCC622.21 but I'm not sure why. Maybe they are being filtered.

It's seems odd that Uniprot doesn't mention that it's pseudo: https://www.uniprot.org/uniprotkb/Q8NIP8/entry

ValWood commented 1 year ago

I think Uniprot are out of date because they are still using the old wtf structures. That should be fixed when we submit the new genome and the revised structures filter through to UniProt.

I think we must filter any GO annotations to pseudogenes somewhere...

ValWood commented 1 year ago

Let's close. We can open if we need to do anything, but I tend to ignore pseudos since there are not many. We can revisit in the future.