Closed cmungall closed 5 months ago
@ssarrafan this is a milestone, can you check for overlapping tickets in the issues repo.
@aclum here is what I found: https://github.com/microbiomedata/issues/issues/522 https://github.com/microbiomedata/issues/issues/518
Closing in favor of https://github.com/microbiomedata/issues/issues/518
For broader context, there is a draft document here: https://docs.google.com/document/d/12ndhKQdGWHoRiWFw4TRqfIjObJSov6NmrkT3ozsSrus/edit
but briefly, this would provide many advantages for NMDC - using a standard, allowing for more powerful function based search leveraging synonyms, hierarchy, use of alternate ID systems, ...
First first approach is almost zero lift for the workflow team. The metadata team could provide a python script that would take existing annotation GFF and produce GFF with GO annotations, we would be responsible for sourcing the mappings.
The second approach would yield more precise and more accurate annotations, but this is a bigger lift. Interproscan can be awkward to run and resource-heavy but I don't think we should dismiss this out of hand. Joe Carlson from Phytozome has a lot of experience running interproscan efficiently in workflows we can draw from. And there may be other approaches including other pipelines, may make sense to discuss these separately rather than outlining in this ticket.