microbiomedata / nmdc-server

Data portal client and server for NMDC.
https://data.microbiomedata.org
Other
8 stars 0 forks source link

Update functional annotation ingest code to support PFAM #1337

Open aclum opened 1 month ago

aclum commented 1 month ago

related to FY25 Q1 milestone https://github.com/microbiomedata/issues/issues/522

Assumes records would be in the functional_annotation_agg collection with a prefix of 'PFAM' example { "metagenome_annotation_id": "nmdc:wfmgan-11-ndgg7v31.1", "gene_function_id": "PFAM:PF02171", "count": 56 }

lookup table: pfam_accession\tclan_accession\tclan_name\tpfam_short_name\tpfam_name /global/cfs/cdirs/m3408/refdata/img/Pfam/Pfam-A-30.0/Pfam-A.clans.tsv

Acceptance criteria: ingest process accepts Pfam terms such that they appear in the postgres database. There will be a separate ticket for data portal front end changes to make the functional search more generic. Front end search will need to support search by pfam_accession (PF02171), pfam_name (Piwi domain) as well as by clan_accession (CL0219) and clan_name (RNase_H).

background: Pfam clans - Pfam defines a clan as a collection of entries that have arisen from a single evolutionary origin. (from https://pfam-docs.readthedocs.io/en/latest/faq.html#:~:text=Pfam%20defines%20a%20clan%20as,available%2C%20from%20common%20sequence%20motifs.)

aclum commented 1 month ago

Alicia to debug with IMG issues with pfam_family_cogs.txt, file doesn't show expected COG category for COG1385

aclum commented 3 weeks ago

Per Natalia don't use pfam_family_cogs.txt.

aclum commented 3 weeks ago

Start after COG implementation is complete.

ssarrafan commented 1 week ago

I don't this this will be done by tomorrow. @aclum who should be the reviewer for this? I'm moving to next sprint.

aclum commented 5 days ago

@marySalvi has been reviewing the COG work. The pfam PR is still in draft i believe. https://github.com/microbiomedata/nmdc-server/pull/1376