nextstrain / mpox

Nextstrain build for mpox virus
https://nextstrain.org/mpox
MIT License
39 stars 16 forks source link

[phylo] CI workflow DAG includes `update_example_data` #237

Open joverlee521 opened 4 months ago

joverlee521 commented 4 months ago

Context

Sometimes running the phylo workflow with the CI configs locally includes the update_example_data rule in the DAG:

$ nextstrain build . --configfile profiles/ci/builds.yaml -n
Building DAG of jobs...
Job stats:
job                            count
---------------------------  -------
align                              1
all                                1
ancestral                          1
clades                             1
colors                             1
combine_samples                    1
copy_example_data                  1
decompress                         1
download                           1
export                             1
filter                             1
final_strain_name                  1
fix_tree                           1
mask                               1
mutation_context                   1
recency                            1
refine                             1
rename_clades                      1
reverse_reverse_complements        1
subsample                          2
traits                             1
translate                          1
tree                               1
update_example_data                1
total                             25
...
Reasons:
    (check individual jobs above for details)
    code has changed since last execution:
        decompress
    input files updated by another job:
        align, all, ancestral, clades, colors, combine_samples, copy_example_data, decompress, export, filter, final_strain_name, fix_tree, mask, mutation_context, recency, refine, rename_clades, reverse_reverse_complements, subsample, traits, translate, tree, update_example_data
    missing output files:
        download
    set of input files has changed since last execution:
        decompress
Some jobs were triggered by provenance information, see 'reason' section in the rule displays above.
If you prefer that only modification time is used to determine whether a job shall be executed, use the command line option '--rerun-triggers mtime' (also see --help).
If you are sure that a change for a certain output file (say, <outfile>) won't change the result (e.g. because you just changed the formatting of a script or environment definition), you can also wipe its metadata to skip such a trigger via 'snakemake --cleanup-metadata <outfile>'. 
Rules with provenance triggered jobs: decompress

This is not an issue in our automated CI runs via GitHub Action because the GH Action workflow does a clean clone of the repo.

Possible solutions

  1. Manually removing the local .snakemake directory clears the Snakemake cache and resolves the issue.
  2. Move the chores.smk file to be conditionally included in the core phylo workflow
  3. Move the chores.smk file to a separate build-config that extends the workflow with custom_rules (conforms to the pathogen-repo-guide)