populationgenomics / production-pipelines

Genomics workflows for CPG using Hail Batch
MIT License
4 stars 1 forks source link

Cpg-utils migration hiccup #695

Open MattWellie opened 4 months ago

MattWellie commented 4 months ago

https://batch.hail.populationgenomics.org.au/batches/443824/jobs/400

ModuleNotFoundError: No module named 'cpg_utils.constants'

in AnnotateVcf (SV)

MattWellie commented 4 months ago

This should just go away if run on the latest cpg_workflows image

MattWellie commented 4 months ago

Ok, looks like there's something odd here - analysis-runner's default config thinks the latest version of cpg_workflows is 1.22.4, instead of 1.22.11. It's causing the wrong version of the cpg_workflows image to be picked up for the cromwell polling job, and that might have a non-functional mix of cpg-utils and analysis-runner (cpg_utils.constants doesn't exist in that image, but the analysis-runner version of cromwell's watch task redirects to cpg-utils expecting it to be there)

@illusional do you know why analysis-runner has a pretty old version of cpg-workflows as the current image (i.e. analysis-runner config --dataset DATASET -o local.toml contains

cpg_workflows = "australia-southeast1-docker.pkg.dev/cpg-common/images/cpg_workflows:1.22.4"
illusional commented 4 months ago

I'd never seen this before: https://github.com/populationgenomics/images/blob/ce5ce6986b473b9c519bf3134eb8a7fd635d7f96/.github/workflows/prep_config.py#L15

https://github.com/populationgenomics/images/blob/ce5ce6986b473b9c519bf3134eb8a7fd635d7f96/.github/workflows/deploy_config.yaml#L180

MattWellie commented 4 months ago

Oof, ok.

The reason for this issue was that the cpg-workflows config entry isn't generated from the production-pipelines repository, it's generated from images, and folded in with the rest of the images entries. That only occurs when we run the images repository CI.

We also have cpg_workflows images built from this repository, so two separate CI pipelines build the same image (bad?) and only one of them updates the relevant config entry (bad.).

As a result the latest version in config became quite out of step with the latest version pushed from this repository.

I'm going to keep this open for now, as we should probably resolve this, but that will be a more focused new issue. FYI @vivbak

illusional commented 4 months ago

This is a bigger conversation for data office hours, but the CPG has some work to improve: