nextstrain / conda-base

Conda package build for nextstrain-base
https://anaconda.org/Nextstrain/nextstrain-base
1 stars 1 forks source link

ci: Test pathogen repo CI builds with the final packages #27

Closed tsibley closed 1 year ago

tsibley commented 1 year ago

[ Commit message based on that of 12000a20 in nextstrain/docker-base.¹ Code changes also based on that commit, plus subsequent commits.² ]

A useful check for if new packages will break our pathogen builds.

I included all pathogen repos that already use our pathogen-repo-ci reusable workflow. It should be minimal effort to maintain this list over time—I expect it to only grow—but perhaps in the future we will want to abstract it out into a shared list of known pathogen repos.

I don't like that we have to copy the build-args for a few of the repos here since it'll be easy for this copy to diverge from the repo's authoritative build-args, but it's necessary for now. Over time as we work towards increased automation of pathogen builds, I think we can get rid of this build-args copy by further standardizing how each repo configures itself for automation. For example, instead of specifying build-args in a repo's CI workflow, the args for CI could be stored in a broader workflow metadata file (e.g. nextstrain-workflow.yaml) read by pathogen-repo-ci, or defined by some other convention.

An alternative to directly running pathogen-repo-ci against each repo here would be instead triggering the CI workflows themselves within each repo. The downside to that is it would divorce the outcomes of those workflows from this one and render them not visible from PRs in this repo. It would also require updates to each repo to support triggering and passing in of additional parameters (i.e. for the package). And finally those CI workflows sometimes run other jobs, like linting and other integration tests (e.g. with Cram), that aren't always necessary to run with a new package.

Related-to: https://github.com/nextstrain/docker-base/pull/148 Related-to: https://github.com/nextstrain/docker-base/pull/150 Related-to: https://github.com/nextstrain/docker-base/pull/151 Related-to: https://github.com/nextstrain/docker-base/pull/154

¹ https://github.com/nextstrain/docker-base/commit/12000a20 ² https://github.com/nextstrain/docker-base/commit/bc22a0bc https://github.com/nextstrain/docker-base/commit/0a20a474 https://github.com/nextstrain/docker-base/commit/75254e92

Testing

tsibley commented 1 year ago

Confirmed it's getting the right package.

tsibley commented 1 year ago

Two failing jobs seem to be issues with those pathogen repos? but they don't fail with the Docker runtime… so hmm.

ncov fails in augur export v2 with

ERROR: results/europe/rbd_levels.json did not contain either `nodes` or `branches`. Please check the formatting of this JSON!

This was also recently reported by a user. So something's up here… Conda runtime is a common factor.

seasonal-flu fails with

Traceback (most recent call last):
  File "/home/runner/work/conda-base/conda-base/scripts/annotate_haplotypes.py", line 62, in <module>
    if clade == "unassigned" or sequence_by_node[node.name] == sequence_by_clade[clade]:
KeyError: '3C.2'
tsibley commented 1 year ago

Those failures should be investigated, but they shouldn't block merging this PR.

huddlej commented 1 year ago

seasonal-flu issue was caused by Augur 22.0.0 change to augur clades output and resolved by 42a351f.

corneliusroemer commented 1 year ago

Excellent work @tsibley! This is super helpful!

The ncov failure is here: https://github.com/nextstrain/conda-base/actions/runs/4961116874/jobs/8915514447#step:8:977

[Fri May 12 16:54:35 2023]
Job 4: Exporting data files for Auspice
Reason: Missing output files: results/europe/ncov_with_accessions.json, results/europe/ncov_with_accessions_root-sequence.json; Input files updated by another job: results/europe/logistic_growth.json, results/europe/colors.tsv, results/europe/tree.nwk, results/europe/epiweeks.json, results/europe/clades.json, results/europe/metadata_adjusted.tsv.xz, results/europe/branch_lengths.json, results/europe/nt_muts.json, results/europe/mutational_fitness.json, results/europe/rbd_levels.json, results/europe/recency.json, results/europe/distances.json, results/europe/description.md, results/europe/auspice_config.json, results/europe/traits.json, results/europe/emerging_lineages.json, results/europe/aa_muts.json

        augur export v2             --tree results/europe/tree.nwk             --metadata results/europe/metadata_adjusted.tsv.xz             --node-data results/europe/branch_lengths.json results/europe/nt_muts.json results/europe/aa_muts.json results/europe/emerging_lineages.json results/europe/clades.json results/europe/recency.json results/europe/traits.json results/europe/logistic_growth.json results/europe/mutational_fitness.json results/europe/distances.json results/europe/epiweeks.json results/europe/rbd_levels.json             --auspice-config results/europe/auspice_config.json             --include-root-sequence             --colors results/europe/colors.tsv             --lat-longs defaults/lat_longs.tsv             --title 'Genomic epidemiology of novel coronavirus - Europe-focused subsampling'             --description results/europe/description.md             --output results/europe/ncov_with_accessions.json 2>&1 | tee logs/export_europe.txt

ERROR: results/europe/rbd_levels.json did not contain either `nodes` or `branches`. Please check the formatting of this JSON!
Validating schema of 'results/europe/nt_muts.json'...
Validating schema of 'results/europe/aa_muts.json'...

@huddlej your fix does resolve it, just reran the job and it fails only for ncov now, no longer seasonal-flu

corneliusroemer commented 1 year ago

Aha, the reason --docker doesn't fail this is that latest docker image is still at 21.1.0, see https://github.com/nextstrain/docker-base/pull/155