nextstrain / seasonal-flu

Scripts. config, and snakefiles for seasonal-flu nextstrain builds
44 stars 26 forks source link

Add reference outgroup for 2y/6m builds #89

Closed joverlee521 closed 2 years ago

joverlee521 commented 2 years ago

Description of proposed changes

Add reference outgroup for 2y and 6m builds to avoid incorrect rooting of trees. The reference is pruned from the tree after the refine step using James' code snippet¹.

The references added are the references used by Nextclade, but with the strain names changed to match the strain name within fauna.

¹ https://github.com/nextstrain/augur/issues/340#issuecomment-545184212

(There's a bunch of path changes, but the main change is in Snakefile_base)

Related issue(s)

Related to https://github.com/nextstrain/augur/issues/340

Testing

joverlee521 commented 2 years ago

Created this as a fix for the rooting problem we see in h1n1pdm builds via the master branch. @huddlej, not sure if we should add a more robust version of this in #76?

joverlee521 commented 2 years ago

I had to make an additional commit post-merge to fix the WHO builds.

I ran into an AmbiguousRuleException:

AmbiguousRuleException:
Rules prune_reference and refine are ambiguous for the file results/tree_pruned_vidrl_h3n2_ha_6y_cell_fra.nwk.
Consider starting rule output with a unique prefix, constrain your wildcards, or use the ruleorder directive.
Wildcards:
    prune_reference: assay=fra,center=vidrl,lineage=h3n2,passage=cell,resolution=6y,segment=ha
    refine: assay=fra,center=pruned_vidrl,lineage=h3n2,passage=cell,resolution=6y,segment=ha
Expected input files:
    prune_reference: results/tree_vidrl_h3n2_ha_6y_cell_fra.nwk
    refine: results/tree-raw_pruned_vidrl_h3n2_ha_6y_cell_fra.nwk results/aligned_pruned_vidrl_h3n2_ha_6y_cell_fra.fasta results/metadata_h3n2_ha.tsvExpected output files:
    prune_reference: results/tree_pruned_vidrl_h3n2_ha_6y_cell_fra.nwk
    refine: results/tree_pruned_vidrl_h3n2_ha_6y_cell_fra.nwk results/branch-lengths_pruned_vidrl_h3n2_ha_6y_cell_fra.json

I assume there's some sort of pattern matching (or maybe a lack thereof?) that is making this happen: center=pruned_vidrl