Open aclum opened 2 months ago
@naglepuff This doesn't appear to be active. Can I move this to the next sprint or backlog @aclum ?
@aclum
To clarify, here's a row from the GO/Pfam mapping file:
Pfam:PF00001 7tm_1 > GO:G protein-coupled receptor activity ; GO:0004930
and here is a row from the GO/KEGG mapping file:
K00001 [GO:0004022 0004023 0004024 0004025]
It looks to me like there's less information in the GO/KEGG mapping file, specifically the descriptions are missing.
Is there a file that maps GO terms (and by that I mean GO:9999999) to their descriptions? I guess I can get many of them from the Pfam/GO mapping file, my concern there is that I might miss some if there are GO terms that map to KEGG terms but not to Pfam terms.
looping in @sierra-moxon
https://purl.obolibrary.org/obo/go/go-basic.json <-- this should have the terms, definitions, etc needed for our use cases.
https://geneontology.org/docs/download-ontology/ defines the 'go-basic' file contents. The PURL above is for the JSON representation of the go-basic
content.
(As an aside, I'm working on a generic ontology loader that will take ontology files and move them into MongoDB so that term names, descriptions, cross references, and synonym metadata will be available for ontologies like GO, ENVO, and ChEBI directly. here it is - https://github.com/sierra-moxon/ontology_loader. It's just in my own org for ease of dev, but happy to move/demo/etc when it's far enough along).
@aclum @naglepuff moving to next sprint but please let me know if you can't work it on in the next 2 weeks.
Mike mentioned this would be next on his list after COG/PFAM so I'll move to the next sprint. @naglepuff @aclum
Assuming this is still Mike's next priority? @naglepuff @aclum let me know if this should be in the backlog instead
Yes, this is the next priority. It is related to milestone 2.12.2 which is due this quarter (end of Dec) so we need to work on this so it can be part of the December release.
Search for a GO term in the data portal should use a mapping file to check mapping files since the GO terms aren't stored directly in the data files.
The Pfam -> GO mapping file is https://current.geneontology.org/ontology/external2go/pfam2go
The code should be able to handle multiple mapping files, there will be a related ticket for the KEGG to GO mapping file.