microbiomedata / nmdc-server

Data portal client and server for NMDC.
https://data.microbiomedata.org
Other
9 stars 0 forks source link

Data Portal - backend support GO facet search via mappings - Pfam #1388

Open aclum opened 2 months ago

aclum commented 2 months ago

Search for a GO term in the data portal should use a mapping file to check mapping files since the GO terms aren't stored directly in the data files.

The Pfam -> GO mapping file is https://current.geneontology.org/ontology/external2go/pfam2go

The code should be able to handle multiple mapping files, there will be a related ticket for the KEGG to GO mapping file.

ssarrafan commented 1 month ago

@naglepuff This doesn't appear to be active. Can I move this to the next sprint or backlog @aclum ?

naglepuff commented 1 month ago

@aclum

To clarify, here's a row from the GO/Pfam mapping file:

Pfam:PF00001 7tm_1 > GO:G protein-coupled receptor activity ; GO:0004930

and here is a row from the GO/KEGG mapping file:

K00001  [GO:0004022 0004023 0004024 0004025]

It looks to me like there's less information in the GO/KEGG mapping file, specifically the descriptions are missing.

Is there a file that maps GO terms (and by that I mean GO:9999999) to their descriptions? I guess I can get many of them from the Pfam/GO mapping file, my concern there is that I might miss some if there are GO terms that map to KEGG terms but not to Pfam terms.

aclum commented 1 month ago

looping in @sierra-moxon

sierra-moxon commented 1 month ago

https://purl.obolibrary.org/obo/go/go-basic.json <-- this should have the terms, definitions, etc needed for our use cases. https://geneontology.org/docs/download-ontology/ defines the 'go-basic' file contents. The PURL above is for the JSON representation of the go-basic content.

(As an aside, I'm working on a generic ontology loader that will take ontology files and move them into MongoDB so that term names, descriptions, cross references, and synonym metadata will be available for ontologies like GO, ENVO, and ChEBI directly. here it is - https://github.com/sierra-moxon/ontology_loader. It's just in my own org for ease of dev, but happy to move/demo/etc when it's far enough along).

ssarrafan commented 1 month ago

@aclum @naglepuff moving to next sprint but please let me know if you can't work it on in the next 2 weeks.

ssarrafan commented 3 weeks ago

Mike mentioned this would be next on his list after COG/PFAM so I'll move to the next sprint. @naglepuff @aclum

ssarrafan commented 1 week ago

Assuming this is still Mike's next priority? @naglepuff @aclum let me know if this should be in the backlog instead

aclum commented 1 week ago

Yes, this is the next priority. It is related to milestone 2.12.2 which is due this quarter (end of Dec) so we need to work on this so it can be part of the December release.