Sometimes running the phylo workflow with the CI configs locally includes the update_example_data rule in the DAG:
$ nextstrain build . --configfile profiles/ci/builds.yaml -n
Building DAG of jobs...
Job stats:
job count
--------------------------- -------
align 1
all 1
ancestral 1
clades 1
colors 1
combine_samples 1
copy_example_data 1
decompress 1
download 1
export 1
filter 1
final_strain_name 1
fix_tree 1
mask 1
mutation_context 1
recency 1
refine 1
rename_clades 1
reverse_reverse_complements 1
subsample 2
traits 1
translate 1
tree 1
update_example_data 1
total 25
...
Reasons:
(check individual jobs above for details)
code has changed since last execution:
decompress
input files updated by another job:
align, all, ancestral, clades, colors, combine_samples, copy_example_data, decompress, export, filter, final_strain_name, fix_tree, mask, mutation_context, recency, refine, rename_clades, reverse_reverse_complements, subsample, traits, translate, tree, update_example_data
missing output files:
download
set of input files has changed since last execution:
decompress
Some jobs were triggered by provenance information, see 'reason' section in the rule displays above.
If you prefer that only modification time is used to determine whether a job shall be executed, use the command line option '--rerun-triggers mtime' (also see --help).
If you are sure that a change for a certain output file (say, <outfile>) won't change the result (e.g. because you just changed the formatting of a script or environment definition), you can also wipe its metadata to skip such a trigger via 'snakemake --cleanup-metadata <outfile>'.
Rules with provenance triggered jobs: decompress
This is not an issue in our automated CI runs via GitHub Action because the GH Action workflow does a clean clone of the repo.
Possible solutions
Manually removing the local .snakemake directory clears the Snakemake cache and resolves the issue.
Move the chores.smk file to be conditionally included in the core phylo workflow
Move the chores.smk file to a separate build-config that extends the workflow with custom_rules (conforms to the pathogen-repo-guide)
Context
Sometimes running the phylo workflow with the CI configs locally includes the
update_example_data
rule in the DAG:This is not an issue in our automated CI runs via GitHub Action because the GH Action workflow does a clean clone of the repo.
Possible solutions
.snakemake
directory clears the Snakemake cache and resolves the issue.chores.smk
file to be conditionally included in the core phylo workflowchores.smk
file to a separate build-config that extends the workflow withcustom_rules
(conforms to the pathogen-repo-guide)