populationgenomics / analysis-runner

MIT License
2 stars 4 forks source link

Alter Docker build file #682

Closed MattWellie closed 7 months ago

MattWellie commented 7 months ago

This started as a deletion of the sample-metadata dependency, which is redundant with the inclusion of the newer metamist dependency. happy to sit back on that as a minimal change (unless there's a reason we are keeping both deps?)

The proposed changes split the crazy 3.6GB mono-layer into 3 components:

My belief is that this change:

AFAIK this is standard docker theory, and the current design we have is non-optimal


Update: layer sizes

This results in 3 layers:

  1. 458MB
  2. 2.5GB
  3. 616MB

So unsurprisingly most of the weight is in the Hail installation, but it's a little more spread out. I'm experimenting with moving PhantomJS into the relatively static layer 1 as well.

illusional commented 7 months ago

Thanks for the suggestion @MattWellie!

I've been thinking about this a bit over the week, I'm totally here for this PR, but our current build mechanism causes a full image rebuild everytime this is run, and almost certainly changes in docker hashes => extra layers, a larger image size. I wonder if it's worth breaking these up into different tagged images, so we're able to pull in the specific image to reduce the time to build (which I'd really love).

On a similar note, it would be great to move this image to the images repo, and as part of that, have some way to chain images together, so a rebuild of some base image could cause a rebuild to chained images. How that works with floating tags I'm not 100% sure yet.

MattWellie commented 7 months ago

FYI @illusional https://github.com/populationgenomics/images/issues/139 (I haven't assigned you, but you should be aware that this is a related issue)

MattWellie commented 7 months ago

Closing this, related issue about beefing up our image building more generally