Slow docker caching in CI

bjorn3 commented 2 weeks ago

The problem seems to be the fact that the built image including all layers is huge (~440MB). This is because building the docker container installs several big dependencies to build sudo-rs including clang and in any case start from a relatively large base image that contains rustc. While near the end clang is uninstalled again, this doesn't have any effect on the cache size as every file that existed in any layer ends up in the cache. And this can't be fixed without merging all layers into one, which would destroy caching entirely.

I propose to build sudo-rs outside of the container using the rustc and clang installed on the host and then copy the built binaries into an image based on debian:bookworm-slim which is also used for og-sudo. Rustc needs to be installed on the host anyway to even build the test framework. And I'm not certain clang is still necessary. It may have been a left over from when bindgen was used at compile time. Doing this all would significantly reduce the size of the docker image we need to cache in CI and thus make caching faster.

squell commented 2 weeks ago

I need some context: which containers is this specifically talking about? (E.g. only compliance test related, or other steps as well?)

My two cents: we have a conservative MSRV so on the one hand sudo-rs of course should be buildable on the host, but we also have some steps where want to have the latest version (specifically Miri and clippy feel useful); and of course we have the "minimal version" check where we use the MSRV.

rnijveld commented 2 weeks ago

The main reasoning was I believe that for local development purposes: having docker do the build allows it tot auto-update whenever you change something, and allows you to just do another call to cargo test in the test framework to run the tests again. There are ways to optimize the docker caching as well: make sure the base layer containing stuff like clang and rustc is already uploaded to wherever the caching takes place, that way the sync only needs to upload the layers created by the application, which most likely aren’t that large.

bjorn3 commented 2 weeks ago

I need some context: which containers is this specifically talking about? (E.g. only compliance test related, or other steps as well?)

I'm talking about the compliance tests only. The rest doesn't use docker at all.

The main reasoning was I believe that for local development purposes: having docker do the build allows it tot auto-update whenever you change something, and allows you to just do another call to cargo test in the test framework to run the tests again.

The test framework can build sudo-rs right before it builds the docker image.

There are ways to optimize the docker caching as well: make sure the base layer containing stuff like clang and rustc is already uploaded to wherever the caching takes place, that way the sync only needs to upload the layers created by the application, which most likely aren’t that large.

That requires uploading them to a real registry, right? Currently CI just uploads the entire docker cache to a github actions cache every time.

squell commented 2 weeks ago

For the compliance tests in CI, running them on the "host" Rustc for me feels like it has the further benefit that it's closer still to the tools used by our downstream packagers used for production builds (when the test framework was set up last year this wasn't an option since Debian hadn't caught up with 1.70 yet).

Of course we also need to be able to easily run the compliance tests locally (without having to manually build an image, upload it to the container, etc).

@japaric

bjorn3 commented 2 weeks ago

I've got a WIP branch at https://github.com/bjorn3/sudo-rs/tree/ci_changes2.

rnijveld commented 2 weeks ago

The main reasoning was I believe that for local development purposes: having docker do the build allows it tot auto-update whenever you change something, and allows you to just do another call to cargo test in the test framework to run the tests again.

The test framework can build sudo-rs right before it builds the docker image.

Sounds like a good idea to me! One thing to keep in mind is to let the rust compiler target the same target as the container image (i.e. probably amd64-linux).

There are ways to optimize the docker caching as well: make sure the base layer containing stuff like clang and rustc is already uploaded to wherever the caching takes place, that way the sync only needs to upload the layers created by the application, which most likely aren’t that large.

That requires uploading them to a real registry, right? Currently CI just uploads the entire docker cache to a github actions cache every time.

That's true, it would probably be much easier to implement the above.

Maybe an additional thing we can look at is freebsd containers/jails, just so we can do the same tests for freebsd (given the linux docker container will never test any freebsd code).

trifectatechfoundation / sudo-rs

Slow docker caching in CI #874