praetorian-inc / noseyparker

Nosey Parker is a command-line program that finds secrets and sensitive information in textual data and Git history.
Apache License 2.0
1.66k stars 79 forks source link

Speed up Docker image creation #159

Closed seqre closed 6 months ago

seqre commented 6 months ago

I got annoyed by how long Docker CI takes, so I thought of ways to speed it up. I realized the biggest issue is that all previous work by the compiler (downloading dependencies, compiling dependencies, compiling crates) is thrown away whenever the source code is modified. This PR is an attempt to fix that up by splitting that work into multiple stages in Dockerfile and utilizing the cargo-chef and GitHub Actions Docker layers cache.

Note: The PR so far contains only working improvements for the Debian Dockerfile. The same change in Alpine requires a different approach, and I'm still working on it.

How it works

  1. Creation of chef image with installed cargo-chef - should be a one-time thing that gets cached and never repeated
  2. Creation of planner (using previously created chef image), which creates cargo-chef's recipe - it runs on each source code change, but it's fast (<1s), changes only when there is a dependency change
  3. Builder (using previously created chef image) does the following:
    1. Copies the recipe from planner - runs once on each recipe change; otherwise, cached
    2. cargo-chef cooks (=downloading dependencies + compilation) - runs once on each recipe change; otherwise, cached
    3. Creates release as before

Benchmark

To benchmark the change from the original and to the modified Dockerfile, I performed the following:

  1. Clean the Docker builder cache and download images used by both (to reduce network speed impact between tests): docker builder prune -f && docker pull rust:1.76-bullseye && docker pull debian:11-slim
  2. Build a Docker image
  3. Make dummy source code change (add fn dummy() {} to noseyparker/src/lib.rs) to force recompilation

Of course, it's only a ballpark, as there was a lot of randomness because of my other actions on the device.

Original Modified Difference
Clean build 6m 50s 8m 54s 2m 4s (~30% slower)
Iterative build 6m 1s 3m 48s 2m 16s (~37% faster)
Clean vs iterative diff 49s (~11% faster) 5m 6s(~57% faster)

Building the first chef image takes around 50/60s, so after it's run once, it should never have to be repeated. So basically, that minute can be removed from the modified clean build time realistically.

Conclusion

As the dependencies don't change that often, this change could speed up the Debian part of the Docker CI by ~37% through the use of caching.