theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
37 stars 17 forks source link

Add mutation context to Augur_PHB for Mpox #499

Closed emily-smith1 closed 2 months ago

emily-smith1 commented 5 months ago

:cool:

:pushpin: Explain the Request

Add mutation context to the Mpox track of the Augur_PHB workflow.

:books: Context

In Nextstrain, users have the option to annotate the Mpox tree with the G->A or C->T fraction. Previous studies have determined that these mutations may be indicative of APOBEC3 editing, which has played a role in sustained human-to-human transmission of Mpox clade IIb. A public health laboratory has requested that we add this feature to better monitor these changes.

image

:chart_with_upwards_trend: Desired Behavior

These data are added to the auspice_input_json file that is output from the Augur_PHB workflow, so users can annotate by this field when viewing the phylogeny in Auspice.

:information_source: Additional Information

Helpful info from the Nextstrain team:

From our monkeypox workflow, the script takes a newick tree and an "nt_muts.json" file (view code).

The nt_muts.json file is generated by an “augur ancestral” command (view).

After which, you'll need to pull out the GA_CT_fraction for each node, either by modifying the script or writing a new script (attached):

json_tsv.py.txt

kapsakcj commented 4 months ago

script here: https://github.com/nextstrain/mpox/blob/master/nextclade/scripts/mutation_context.py

kapsakcj commented 4 months ago

to test locally:

test with this command:

python3 scripts/mutation_context.py \
  --tree results/hmpxv1/tree.nwk \
  --mutations results/hmpxv1/nt_muts.json \
  --output mutation_context.json

This should output a mutation_context.json file

Can skip augur_translate step

Then test with augur export with the mutation_context.json added as an input for node_jsons

emily-smith1 commented 3 months ago

This has been tested successfully by a public health partner, who suggested we include detailed documentation on what the numbers in the legend represent and an example use case.