Add mutation context to Augur_PHB for Mpox

theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.

GNU General Public License v3.0

37 stars 17 forks source link

Add mutation context to Augur_PHB for Mpox #499

Closed emily-smith1 closed 2 months ago

emily-smith1 commented 5 months ago

:cool:

:pushpin: Explain the Request

Add mutation context to the Mpox track of the Augur_PHB workflow.

:books: Context

In Nextstrain, users have the option to annotate the Mpox tree with the G->A or C->T fraction. Previous studies have determined that these mutations may be indicative of APOBEC3 editing, which has played a role in sustained human-to-human transmission of Mpox clade IIb. A public health laboratory has requested that we add this feature to better monitor these changes.

:chart_with_upwards_trend: Desired Behavior

These data are added to the auspice_input_json file that is output from the Augur_PHB workflow, so users can annotate by this field when viewing the phylogeny in Auspice.

:information_source: Additional Information

Helpful info from the Nextstrain team:

From our monkeypox workflow, the script takes a newick tree and an "nt_muts.json" file (view code).

The nt_muts.json file is generated by an “augur ancestral” command (view).

After which, you'll need to pull out the GA_CT_fraction for each node, either by modifying the script or writing a new script (attached):

json_tsv.py.txt

kapsakcj commented 4 months ago

script here: https://github.com/nextstrain/mpox/blob/master/nextclade/scripts/mutation_context.py

testing manually (not in the context of a WDL workflow) to see if this is feasible in the first place & create an auspice_input_json file with the 2 new desired options (view in auspice to confirm)
Find a suitable docker image or create a new docker image to run the mutation_context.py script
create WDL task for running script; test
add this task to the augur workflow; test
add output of this new task, which should be a mutation_context.json file to be passed into augur export command which produces the final auspice_input_json file

kapsakcj commented 4 months ago

to test locally:

inputs required:
- augur_refine.refined_tree from augur refine
- augur_ancestral.ancestral_nt_muts_json from augur ancestral
mutation_context.py from here: https://github.com/nextstrain/mpox/blob/master/nextclade/scripts/mutation_context.py

test with this command:

python3 scripts/mutation_context.py \
  --tree results/hmpxv1/tree.nwk \
  --mutations results/hmpxv1/nt_muts.json \
  --output mutation_context.json

This should output a mutation_context.json file

Can skip augur_translate step

Then test with augur export with the mutation_context.json added as an input for node_jsons

emily-smith1 commented 3 months ago

This has been tested successfully by a public health partner, who suggested we include detailed documentation on what the numbers in the legend represent and an example use case.