Closed tsibley closed 1 year ago
Confirmed it's getting the right package.
Two failing jobs seem to be issues with those pathogen repos? but they don't fail with the Docker runtime… so hmm.
ncov fails in augur export v2
with
ERROR: results/europe/rbd_levels.json did not contain either `nodes` or `branches`. Please check the formatting of this JSON!
This was also recently reported by a user. So something's up here… Conda runtime is a common factor.
seasonal-flu fails with
Traceback (most recent call last):
File "/home/runner/work/conda-base/conda-base/scripts/annotate_haplotypes.py", line 62, in <module>
if clade == "unassigned" or sequence_by_node[node.name] == sequence_by_clade[clade]:
KeyError: '3C.2'
Those failures should be investigated, but they shouldn't block merging this PR.
seasonal-flu issue was caused by Augur 22.0.0 change to augur clades
output and resolved by 42a351f.
Excellent work @tsibley! This is super helpful!
The ncov failure is here: https://github.com/nextstrain/conda-base/actions/runs/4961116874/jobs/8915514447#step:8:977
[Fri May 12 16:54:35 2023]
Job 4: Exporting data files for Auspice
Reason: Missing output files: results/europe/ncov_with_accessions.json, results/europe/ncov_with_accessions_root-sequence.json; Input files updated by another job: results/europe/logistic_growth.json, results/europe/colors.tsv, results/europe/tree.nwk, results/europe/epiweeks.json, results/europe/clades.json, results/europe/metadata_adjusted.tsv.xz, results/europe/branch_lengths.json, results/europe/nt_muts.json, results/europe/mutational_fitness.json, results/europe/rbd_levels.json, results/europe/recency.json, results/europe/distances.json, results/europe/description.md, results/europe/auspice_config.json, results/europe/traits.json, results/europe/emerging_lineages.json, results/europe/aa_muts.json
augur export v2 --tree results/europe/tree.nwk --metadata results/europe/metadata_adjusted.tsv.xz --node-data results/europe/branch_lengths.json results/europe/nt_muts.json results/europe/aa_muts.json results/europe/emerging_lineages.json results/europe/clades.json results/europe/recency.json results/europe/traits.json results/europe/logistic_growth.json results/europe/mutational_fitness.json results/europe/distances.json results/europe/epiweeks.json results/europe/rbd_levels.json --auspice-config results/europe/auspice_config.json --include-root-sequence --colors results/europe/colors.tsv --lat-longs defaults/lat_longs.tsv --title 'Genomic epidemiology of novel coronavirus - Europe-focused subsampling' --description results/europe/description.md --output results/europe/ncov_with_accessions.json 2>&1 | tee logs/export_europe.txt
ERROR: results/europe/rbd_levels.json did not contain either `nodes` or `branches`. Please check the formatting of this JSON!
Validating schema of 'results/europe/nt_muts.json'...
Validating schema of 'results/europe/aa_muts.json'...
@huddlej your fix does resolve it, just reran the job and it fails only for ncov now, no longer seasonal-flu
Aha, the reason --docker
doesn't fail this is that latest
docker image is still at 21.1.0, see https://github.com/nextstrain/docker-base/pull/155
[ Commit message based on that of 12000a20 in nextstrain/docker-base.¹ Code changes also based on that commit, plus subsequent commits.² ]
A useful check for if new packages will break our pathogen builds.
I included all pathogen repos that already use our pathogen-repo-ci reusable workflow. It should be minimal effort to maintain this list over time—I expect it to only grow—but perhaps in the future we will want to abstract it out into a shared list of known pathogen repos.
I don't like that we have to copy the build-args for a few of the repos here since it'll be easy for this copy to diverge from the repo's authoritative build-args, but it's necessary for now. Over time as we work towards increased automation of pathogen builds, I think we can get rid of this build-args copy by further standardizing how each repo configures itself for automation. For example, instead of specifying build-args in a repo's CI workflow, the args for CI could be stored in a broader workflow metadata file (e.g. nextstrain-workflow.yaml) read by pathogen-repo-ci, or defined by some other convention.
An alternative to directly running pathogen-repo-ci against each repo here would be instead triggering the CI workflows themselves within each repo. The downside to that is it would divorce the outcomes of those workflows from this one and render them not visible from PRs in this repo. It would also require updates to each repo to support triggering and passing in of additional parameters (i.e. for the package). And finally those CI workflows sometimes run other jobs, like linting and other integration tests (e.g. with Cram), that aren't always necessary to run with a new package.
Related-to: https://github.com/nextstrain/docker-base/pull/148 Related-to: https://github.com/nextstrain/docker-base/pull/150 Related-to: https://github.com/nextstrain/docker-base/pull/151 Related-to: https://github.com/nextstrain/docker-base/pull/154
¹ https://github.com/nextstrain/docker-base/commit/12000a20 ² https://github.com/nextstrain/docker-base/commit/bc22a0bc https://github.com/nextstrain/docker-base/commit/0a20a474 https://github.com/nextstrain/docker-base/commit/75254e92
Testing