Closed aclum closed 1 month ago
@aclum @scanon
Where do we find the mag_stats.json
file?
For MAGs, workflows.yaml
specifies this:
Workflow Execution:
name: "Metagenome Assembled Genomes Analysis for {id}"
type: nmdc:MagsAnalysis
binned_contig_num: "{outputs.final_stats_json.binned_contig_num}"
input_contig_num: "{outputs.final_stats_json.input_contig_num}"
low_depth_contig_num: "{outputs.final_stats_json.low_depth_contig_num}"
mags_list: "{outputs.final_stats_json.mags_list}"
too_short_contig_num: "{outputs.final_stats_json.too_short_contig_num}"
unbinned_contig_num: "{outputs.final_stats_json.unbinned_contig_num}"
The metaMAGs
workflow has this in its Output files:
|-- project_name_mags_stats.json
Where do we find the stats.json data? I assume that the WorkflowExecution entries like |-- project_name_mags_stats.json
map to things in the outputs
records in the metadata returned by Cromwell from the /metadata
endpoint.
It would be helpful to have examples of the stats.json file and a Cromwell metadata api response
@aclum @scanon @mbthornton-lbl any chance this can be closed this sprint?
Some keys in the activity records are not being populated to mongo. The example below is for MAGs but this appears to be a generic problem.
For example: binned_contig_num, mags_list for the MagsAnalysisActivity.
The last records that contain this information in the mongo prod documents are from February 2024
Shane said this change is likely related to the shadow schema classes and referenced this for loop https://github.com/microbiomedata/nmdc_automation/blob/acee08ecf776c0c0a6de07549f3[…]28e2c0ac02c41/nmdc_automation/workflow_automation/watch_nmdc.py
In discussions with Michael first place to look is the create_activity_record function. https://github.com/microbiomedata/nmdc_automation/blob/52917d816ee710c036855a8273657341d1e644d3/nmdc_automation/workflow_automation/wfutils.py#L306
for an example MAGS workflow /pscratch/sd/n/nmdcda/cromwell-executions/nmdc_mags/9492a397-eb30-472b-9d3b-b44b676f4652/call-finish_mags/execution the code should check stats file nmdc_wfmag-11-g7msr323.1_mags_stats.json. In this case binned_contig_num should exist in the record with a value of 22281.
cc @scanon