Closed aclum closed 2 months ago
Hi @aclum, the Issue description says "This PR". Did you mean to link to a PR here?
I can help with the "update image on Spin" (a.k.a. build and publish a new container image to GHCR and configure a Spin workload to run it) portions of this task.
Yes, sorry, link updated in the description.
@mbthornton-lbl to start on this this sprint and pair/hand off to @eecavanna for the SPIN portion.
example for unit test:
example gff
nmdc:wfmtan-11-5rqhd817.1_0000001 Prodigal v2.6.3_patched CDS 2931 5588 340.0 + 0 ID=nmdc:wfmgan-11-5rqhd817.1_0000001_2931_5588;translation_table=11;start_type=ATG;product=O-antigen biosynthesis protein;product_source=KO:K20444;cath_funfam=3.20.20.80,3.90.550.10;cog=COG0463;ko=KO:K20444;ec_number=EC:2.4.1.-;pfam=PF00535,PF02836;superfamily=51445,53448
nmdc:wfmgat-11-5rqhd817.1_0000001 Prodigal v2.6.3_patched CDS 5585 7381 320.4 + 0 ID=nmdc:wfmgan-11-5rqhd817.1_0000001_5585_7381;translation_table=11;start_type=ATG;product=ATP-binding cassette, subfamily B, bacterial;product_source=KO:K06147;cath_funfam=1.20.1560.10,3.40.50.300;cog=COG1132;ko=KO:K06147;pfam=PF00005,PF00664;smart=SM00382;superfamily=52540,90123
The above example gff is expected to insert two documents in functional_annotation_agg
, one document for K20444
and one for K06147
.
expected new mongo record:
[{"metagenome_annotation_id":"nmdc:wfmgan-11-5rqhd817.1",
"gene_function_id":"KEGG.ORTHOLOGY:K20444",
"count":1},
{{"metagenome_annotation_id":"nmdc:wfmgan-11-5rqhd817.1",
"gene_function_id":"KEGG.ORTHOLOGY:K06147",
"count":1}
]
Example functional test would be use json:submit
to submit a metatranscriptome_annotation_set
and corresponding data_object_set
records to runtime:dev including data object set record with a data_object_type
of Functional Annotation GFF
which had KEGG terms.
Hi @mbthornton-lbl, once you're ready to test this out in the development environment on Spin, you can @-mention me in a comment here that says that; in response to which I'll go through these steps to create a GitHub Release of this repo (which will automatically create and publish a new container image to GitHub Container Registry) and then update the associated workload in the development environment so that it runs that new container image; at which point I'll hand things back off to you to test it in that environment. I think that's the longest sentence I've written all year!
@eecavanna emptied out mongo dev last night in https://github.com/microbiomedata/infra-admin/issues/120 and it repopulated today with 25 million records which looks correct. There was an issue with the dev mongo to dev postgres ingest but @naglepuff restarted that just a few minutes ago. I'd like to make sure this works properly before we apply this to production.
@naglepuff said the ingest on dev went smoothly so we are good to update the image in SPIN prod. @chienchi if you don't have permission to do that please coordinate with @eecavanna
Hi @chienchi, the process is exactly the same in the production environment (namespace: nmdc
) as in the development environment (namespace: nmdc-dev
), except:
aggregations
(plural)aggregation
(singular)I assume whoever created those deployments named them differently by mistake.
aggregations
deployment in the production environment is currently running image ghcr.io/microbiomedata/nmdc-aggregator:1.0.6
aggregation
deployment in the development environment is currently running image ghcr.io/microbiomedata/nmdc-aggregator:1.0.8
Thanks for the instructions. I have deployed the new version on production environment.
Great! Looks good to me (thanks for including a screenshot).
In that case, I will do the task in this follow-on ticket (https://github.com/microbiomedata/infra-admin/issues/123), which is to empty out the functional_annotation_agg
collection in the production Mongo database. I'll leave it to y'all to close this ticket when you want to.
This PR adds new Database slots for metatranscriptomes, we need to make sure the metatranscriptome annotation records are included in the KEGG aggregation results.
cc @eecavanna
depends on https://github.com/microbiomedata/nmdc_automation/issues/195