populationgenomics / production-pipelines

Genomics workflows for CPG using Hail Batch
MIT License
2 stars 0 forks source link

Move to `src/` layout? #813

Open MattWellie opened 5 days ago

MattWellie commented 5 days ago

Inspired by the python best practices @dancoates mentioned earlier in the week

https://www.pyopensci.org/python-package-guide/package-structure-code/python-package-structure.html

We have a fundamental issue with our usage of cpg_workflows in Cloud infrastructure:

Something as basic as moving to a src layout could be a partial solution to this. We would never be in a position to accidentally import from cpg_workflows import X and pick up the cloned code instead of the installed code. That would create stability between the code running when creating a batch, and the code running inside a batch.

The issue with the flat layout we have is that you can accidentally run code you don't intend to. Normally running in a container with pre-installed versions would overcome this, but analysis-runner clones the codebase into the running container before doing anything else, so we're right back at this vulnerability.

This change would make the pipeline more cumbersome to run while undergoing development - to test new changes, you need to push a new image and use it as a driver image - but it would add zero burden to code in main, and stability there should be the most important factor when we are trying to nail down exact versions of our tools and code.