microbiomedata / WorkflowPlanning

This is primarily a repo for capturing policies and discussions around the different workflows for NMDC. This is also used for project management related pieces.
7 stars 6 forks source link

Proposal: include GO functional annotation in annotation workflows #33

Closed cmungall closed 5 months ago

cmungall commented 2 years ago

For broader context, there is a draft document here: https://docs.google.com/document/d/12ndhKQdGWHoRiWFw4TRqfIjObJSov6NmrkT3ozsSrus/edit

but briefly, this would provide many advantages for NMDC - using a standard, allowing for more powerful function based search leveraging synonyms, hierarchy, use of alternate ID systems, ...

  1. mapping approach, mapping existing annotations (KEGG, Pfam, EC)
  2. direct annotation of GO annotations using a framework like interproscan or treegrafter; the NCBI PGAP tool will also do GO annotation based on their curated mappings between NCBI fams

First first approach is almost zero lift for the workflow team. The metadata team could provide a python script that would take existing annotation GFF and produce GFF with GO annotations, we would be responsible for sourcing the mappings.

The second approach would yield more precise and more accurate annotations, but this is a bigger lift. Interproscan can be awkward to run and resource-heavy but I don't think we should dismiss this out of hand. Joe Carlson from Phytozome has a lot of experience running interproscan efficiently in workflows we can draw from. And there may be other approaches including other pipelines, may make sense to discuss these separately rather than outlining in this ticket.

aclum commented 5 months ago

@ssarrafan this is a milestone, can you check for overlapping tickets in the issues repo.

ssarrafan commented 5 months ago

@aclum here is what I found: https://github.com/microbiomedata/issues/issues/522 https://github.com/microbiomedata/issues/issues/518

aclum commented 5 months ago

Closing in favor of https://github.com/microbiomedata/issues/issues/518