opensafely-core / ehrql

ehrQL: the electronic health record query language for OpenSAFELY
https://docs.opensafely.org/ehrql/
Other
7 stars 3 forks source link

Borrow Dockerfile improvements from cohortextractor #665

Closed evansd closed 2 years ago

evansd commented 2 years ago

Simon has done lots of work on the cohortextractor Dockerfile to give it nice caching/layering behaviour, support fast builds and produce compact production images without lots build-time cruft. It also uses the base-action rather than base-docker image, which I don't think makes much difference right now but is important for maintaining consistency.

We should borrow the approach here for Databuilder.

cohortextractor Dockerfile: https://github.com/opensafely-core/cohortextractor/blob/main/Dockerfile

databuilder Dockerfile: https://github.com/opensafely-core/databuilder/blob/main/Dockerfile

Lots of this can be copied over directly with just changes in names. The major differences are:

rebkwok commented 2 years ago

Progress so far: I have 2 draft PRs that are similar but slightly different in what they do with the the pyproject.toml.

I think the 2nd one is probably what we want, but both are there in case it's useful to see what I've tried so far:

1) https://github.com/opensafely-core/databuilder/pull/773 This follows the way cohort-extractor's Dockerfile works, including installing from the pyproject.toml and making the executable databuilder available. BUT - that didn't work with flit, so I had to make changes to the pyproject.toml file, which then means that pip-compile isn't working properly. So this builds the docker image correctly, afaict, and it's pretty quick about it, and changes to app code don't take long to rebuild, but the pyproject.toml is broken for other things, including tests (with a "failed to parse pyproject.toml" error. At the moment I don't think we're actually building a databuilder package, but I assume we probably will in future.

2) https://github.com/opensafely-core/databuilder/pull/774 This updates the Dockerfile with similar improvements to those in cohort-extractor, but doesn't pip install using the pyproject.toml. It keeps the entrypoint we had before (python3.9 -m -B databuilder) but moves it out into an entrypoint script so it can be set as the ACTION_EXEC env and passed as the entrypoint to the base-action.

Both of these versions work to run a project locally, by: