An important part of the pipeline is excluding some mutations that are masked in the pre-built UShER mutation-annotated tree or otherwise problematic.
The plot that informs manual identification of mutations (specified in config.yaml) to exclude is buried in the synonymous_mut_rates notebook in pipeline, and then mutations to exclude are manually specified.
But I am not sure if I thought to re-check to see if she had added more mutations to her bash script when I updated the mutation-annotated tree.
So if we want to re-run the pipeline regularly on new mutation-annotated trees, we definitely need to automate this. However, that might be a lot of effort.
Assuming we don't automate, somehow when data is updated we need to check if there are new masked or excluded sites. Better instructions on how to do this would probably help.
(This issue recaps issues mentioned by @rneher on our internal Slack when he flagged site 18591 as unusually "mutated" in recent strains.)
An important part of the pipeline is excluding some mutations that are masked in the pre-built UShER mutation-annotated tree or otherwise problematic.
The plot that informs manual identification of mutations (specified in
config.yaml
) to exclude is buried in thesynonymous_mut_rates
notebook in pipeline, and then mutations to exclude are manually specified.In addition, Angie sometimes masks sites in a clade-specific fashion when she builds the mutation-annotated tree. She does these exclusions in a bash script, which I manually converted to a machine readable YAML at some point (https://github.com/jbloomlab/SARS2-mut-fitness/blob/main/data/usher_masked_sites.yaml).
But I am not sure if I thought to re-check to see if she had added more mutations to her bash script when I updated the mutation-annotated tree.
So if we want to re-run the pipeline regularly on new mutation-annotated trees, we definitely need to automate this. However, that might be a lot of effort.
Assuming we don't automate, somehow when data is updated we need to check if there are new masked or excluded sites. Better instructions on how to do this would probably help.
(This issue recaps issues mentioned by @rneher on our internal Slack when he flagged site 18591 as unusually "mutated" in recent strains.)
See also following related issues on
UShER
: