Open aclum opened 1 month ago
Alicia to debug with IMG issues with pfam_family_cogs.txt, file doesn't show expected COG category for COG1385
Per Natalia don't use pfam_family_cogs.txt.
Start after COG implementation is complete.
I don't this this will be done by tomorrow. @aclum who should be the reviewer for this? I'm moving to next sprint.
@marySalvi has been reviewing the COG work. The pfam PR is still in draft i believe. https://github.com/microbiomedata/nmdc-server/pull/1376
related to FY25 Q1 milestone https://github.com/microbiomedata/issues/issues/522
Assumes records would be in the functional_annotation_agg collection with a prefix of 'PFAM' example { "metagenome_annotation_id": "nmdc:wfmgan-11-ndgg7v31.1", "gene_function_id": "PFAM:PF02171", "count": 56 }
lookup table: pfam_accession\tclan_accession\tclan_name\tpfam_short_name\tpfam_name /global/cfs/cdirs/m3408/refdata/img/Pfam/Pfam-A-30.0/Pfam-A.clans.tsv
Acceptance criteria: ingest process accepts Pfam terms such that they appear in the postgres database. There will be a separate ticket for data portal front end changes to make the functional search more generic. Front end search will need to support search by pfam_accession (PF02171), pfam_name (Piwi domain) as well as by clan_accession (CL0219) and clan_name (RNase_H).
background: Pfam clans - Pfam defines a clan as a collection of entries that have arisen from a single evolutionary origin. (from https://pfam-docs.readthedocs.io/en/latest/faq.html#:~:text=Pfam%20defines%20a%20clan%20as,available%2C%20from%20common%20sequence%20motifs.)