Closed kimrutherford closed 1 year ago
Thee should not be any annotation on pseudogenes. Which ones have them? I will take a look...
I filtered IEAs from [IPR004982], that might get rid of most. I'll check tomorrow what is left. Ine might need to change status and have pseudo removed, but I might need to ask about that.
(it's annotation on wtfs, but some are not functional as meiotic drivers)
Yep, all the annotated pseudogenes are WTFs.
https://www.pombase.org/results/from/id/72854300-1903-4e1c-b6bd-5c9cbca1fa0b
I think I have removed all of our annotations to pseudognes. Can we block imported annotation on pseudogenes (not urgent)
@kimrutherford are there currently any GO annotation on pseudogenes?
are there currently any GO annotation on pseudogenes?
7 pseudogenes have GO annotation: https://www.pombase.org/results/from/id/862dfa78-bbfb-4cb4-a3c5-835839c01339
7 pseudogenes have GO annotation:
Here are the annotations:
PomBase SPBC1706.02c wtf2 GO:0016021 GO_REF:0000043 IEA UniProtKB-KW:KW-0812 C wtf element (with segmental deletion) Wtf2 SPBC1706.02 pseudogenic_transcript taxon:4896 20220718 UniProt
PomBase SPCC285.06c wtf17 GO:0016021 GO_REF:0000043 IEA UniProtKB-KW:KW-0812 C wtf element Wtf17 (no initiator methionine) pseudogenic_transcript taxon:4896 20220718 UniProt
PomBase SPCC306.10 wtf8 GO:0005737 PMID:16823372 HDA C wtf element Wtf8 (frameshifted) pseudogenic_transcript taxon:4896 20091111 PomBase
PomBase SPCC306.10 wtf8 GO:0000324 PMID:16823372 HDA C wtf element Wtf8 (frameshifted) pseudogenic_transcript taxon:4896 20091111 PomBase
PomBase SPCC306.10 wtf8 GO:0016021 GO_REF:0000043 IEA UniProtKB-KW:KW-0812 C wtf element Wtf8 (frameshifted) pseudogenic_transcript taxon:4896 20220718 UniProt
PomBase SPCC553.05c wtf6 GO:0016021 GO_REF:0000043 IEA UniProtKB-KW:KW-0812 C wtf element Wtf6 (frameshifted) SPCC553.05 pseudogenic_transcript taxon:4896 20220718 UniProt
PomBase SPCC576.16c wtf22 GO:0016021 GO_REF:0000043 IEA UniProtKB-KW:KW-0812 C wtf element Wtf22 (frameshifted) pseudogenic_transcript taxon:4896 20220718 UniProt
PomBase SPCC622.21 wtf12 GO:0016021 GO_REF:0000043 IEA UniProtKB-KW:KW-0812 C wtf element Wtf12 (truncated) pseudogenic_transcript taxon:4896 20220718 UniProt
PomBase SPCC622.21 wtf12 GO:0016020 GO_REF:0000051 NAS C wtf element Wtf12 (truncated) pseudogenic_transcript taxon:4896 20090126 PomBase
PomBase SPCC830.02 wtf24 GO:0016021 GO_REF:0000043 IEA UniProtKB-KW:KW-0812 C wtf element Wtf24 (frameshifted) pseudogenic_transcript taxon:4896 20220718 UniProt
I removed the HDA to the legacy file to external_data/external-go-data/GO_ORFeome_localizations2_deleted.txt and the GO_REF:0000051 NAS from go_comp.txt
@kimrutherford can you filter any imported GO annotations to pseudogenes from the GOA file on import (I can't filter the UniProtKB-KW:KW-0812 because it is used for a lot of correct "integral to membrane" assignemnts
Rereading the original issue, you don't need to export annotations to pseudognes, but this should also prevent us accumulating. any.
A log file of "GO annotation to pseudogenes" might be useful.
I've done a quick check. We don't have any annotations to pseudogenes now.
GOA have a few annotations for wtf12: https://www.uniprot.org/uniprotkb/Q8NIP8/entry
Those GOA annotation aren't being loaded: https://www.pombase.org/gene/SPCC622.21 but I'm not sure why. Maybe they are being filtered.
It's seems odd that Uniprot doesn't mention that it's pseudo: https://www.uniprot.org/uniprotkb/Q8NIP8/entry
I think Uniprot are out of date because they are still using the old wtf structures. That should be fixed when we submit the new genome and the revised structures filter through to UniProt.
I think we must filter any GO annotations to pseudogenes somewhere...
Let's close. We can open if we need to do anything, but I tend to ignore pseudos since there are not many. We can revisit in the future.
I've just noticed that we don't include annotation for pseudogenes in the GPAD file we created for GO. The pseudogenes are in the GPI file though.
I don't know why I implemented that way. I thought I'd check before fixing it: should we be sending annotation for pseudogenes to GO in the GPAD file?
(The GAF file includes the pseudogene annotation)