tracking issue: chained builds and linuxkit and buildkit

This is a tracking issue for the chained builds problem. It is intended to be used as a place to track the issue and discuss possible solutions.

cc the following people who have been helping look into it: @eriknordmark @rvs @petr-zededa @ruslan-zededa . With many thanks to them.

Problem Description

eve-os has many Dockerfiles. Each of these consumes other images in FROM, sometimes multiple.
eve-os uses linuxkit to build packages.
recent versions of linuxkit (which eve-os does not use yet) uses buildkit container-driver builders. This is to take advantage of its speed of build, caching, and especially multi-architecture images, all of which docker (and buildkit in docker driver, aka buildx) does not support.
buildkit with container-driver only supports loading FROM images via networked registries, i.e. FROM eve/foo:1.2.3 will only look in dockerhub from eve/foo:1.2.3. It does not look in the local cache, whether linuxkit's or docker's.
eve-os build process - for developers locally and for CI - leverages interim images: build one image, store it locally, build another.

The above is the reason why the eve-os build process does not use the latest linuxkit. We cannot combine the docker-style "build an image, save it locally, build another that consumes that locally saved image" with the buildkit-style "build images across architectures". This is the catch-22

Scope

This only affects Dockerfile to Dockerfile "chained" builds. This does not affect final OS image build. The outputs of the various docker build commands is consumable by linuxkit wherever it is.

Impacted Files

In order to get a better handle on the scope of the problem, I ran a grep through all of lf-edge/eve to find what files are affected. Excluded vendor directories and our unique .go caching system.

find . -name 'Dockerfile*' -not -path './.go/*' -a -not -path '*/vendor/*' -exec grep --with-filename FROM {} \;

the results show 85 FROM in Dockerfiles.

./build-tools/src/scripts/Dockerfile:FROM golang:${GOVER}-alpine
./build-tools/src/scripts/Dockerfile.alpine.bootstrap:FROM alpine as builder
./build-tools/src/scripts/Dockerfile.alpine.bootstrap:FROM scratch
./pkg/vtpm/Dockerfile:FROM lfedge/eve-alpine:6.7.0 as build
./pkg/vtpm/Dockerfile:FROM scratch
./pkg/strongswan/Dockerfile:FROM lfedge/eve-alpine:6.7.0 as build
./pkg/strongswan/Dockerfile:FROM scratch
./pkg/gpt-tools/Dockerfile:FROM ${EVE_BUILDER_IMAGE} as build
./pkg/gpt-tools/Dockerfile:FROM scratch
./pkg/alpine/Dockerfile:FROM ${EVE_BUILDER_IMAGE} AS cache
./pkg/alpine/Dockerfile:# you have to have FROM alpine:x.y.z above:
./pkg/alpine/Dockerfile:FROM ${EVE_BUILDER_IMAGE}
./pkg/rngd/Dockerfile:FROM lfedge/eve-alpine:6.7.0 AS build
./pkg/rngd/Dockerfile:FROM scratch
./pkg/xen/Dockerfile:FROM lfedge/eve-alpine:6.7.0 as kernel-build
./pkg/xen/Dockerfile:FROM scratch
./pkg/new-kernel/Dockerfile:FROM ${EVE_BUILDER_IMAGE} as kernel-build
./pkg/new-kernel/Dockerfile:FROM scratch
./pkg/ipxe/Dockerfile:FROM ${EVE_BUILDER_IMAGE} AS build
./pkg/ipxe/Dockerfile:FROM scratch
./pkg/uefi/Dockerfile:FROM ${EVE_BUILDER_IMAGE} as build
./pkg/uefi/Dockerfile:FROM scratch
./pkg/mkconf/Dockerfile:FROM ${EVE_BUILDER_IMAGE} AS build
./pkg/mkconf/Dockerfile:FROM scratch
./pkg/fw/Dockerfile:FROM alpine:edge as build
./pkg/fw/Dockerfile:FROM scratch
./pkg/acrn-kernel/Dockerfile:FROM lfedge/eve-alpine:6.7.0 as kernel-build
./pkg/acrn-kernel/Dockerfile:FROM scratch
./pkg/mkimage-iso-efi/Dockerfile:FROM ${EVE_BUILDER_IMAGE} AS build
./pkg/mkimage-iso-efi/Dockerfile:FROM scratch
./pkg/storage-init/Dockerfile:FROM ${EVE_BUILDER_IMAGE} as build
./pkg/storage-init/Dockerfile:FROM scratch
./pkg/newlog/Dockerfile:FROM lfedge/eve-alpine:6.7.0 as build
./pkg/newlog/Dockerfile:FROM scratch
./pkg/dom0-ztools/Dockerfile:FROM ${EVE_BUILDER_IMAGE} as zfs
./pkg/dom0-ztools/Dockerfile:FROM scratch
./pkg/mkrootfs-ext4/Dockerfile:FROM lfedge/eve-alpine:6.7.0 AS build
./pkg/mkrootfs-ext4/Dockerfile:FROM scratch
./pkg/pillar/Dockerfile:FROM lfedge/eve-alpine:3a7658b4168bcf40dfbcb15fbae8979d81efb6f1 as build
./pkg/pillar/Dockerfile:FROM lfedge/eve-dnsmasq:cc2426e0f51538f60e82c7ffe26e6a857fdc2483 as dnsmasq
./pkg/pillar/Dockerfile:FROM lfedge/eve-strongswan:5b322e95477774eca6ecf2fbe10945b56ff5310b as strongswan
./pkg/pillar/Dockerfile:FROM lfedge/eve-gpt-tools:11cd917db4f3b5d12bec731926af2832d52f5362 as gpttools
./pkg/pillar/Dockerfile:FROM scratch
./pkg/qrexec-dom0/Dockerfile:FROM lfedge/eve-xen-tools:5f5a4015f2e392b5e8afb76fbc3019d17425e029 as xentools
./pkg/qrexec-dom0/Dockerfile:FROM lfedge/eve-alpine:6.7.0 as build
./pkg/qrexec-dom0/Dockerfile:FROM scratch
./pkg/mkrootfs-squash/Dockerfile:FROM lfedge/eve-alpine:6.7.0 AS build
./pkg/mkrootfs-squash/Dockerfile:FROM scratch
./pkg/xen-tools/Dockerfile:FROM lfedge/eve-uefi:6be7d8ce5c058a39cb0b120f84e2e508e126c24a as uefi-build
./pkg/xen-tools/Dockerfile:FROM lfedge/eve-alpine:6.7.0 as runx-build
./pkg/xen-tools/Dockerfile:FROM lfedge/eve-alpine:6.7.0 as build
./pkg/xen-tools/Dockerfile:FROM scratch
./pkg/u-boot/Dockerfile:FROM ${EVE_BUILDER_IMAGE} as build
./pkg/u-boot/Dockerfile:FROM scratch
./pkg/test-microsvcs/Dockerfile:FROM unikernel/mirage-stable:latest AS mirage
./pkg/test-microsvcs/Dockerfile:FROM scratch
./pkg/watchdog/Dockerfile:FROM lfedge/eve-alpine:6.7.0 AS watchdog-build
./pkg/watchdog/Dockerfile:FROM scratch
./pkg/eve/Dockerfile.in:FROM ${EVE_BUILDER_IMAGE} as tools
./pkg/eve/Dockerfile.in:FROM MKISO_TAG as iso
./pkg/eve/Dockerfile.in:FROM IPXE_TAG as ipxe
./pkg/eve/Dockerfile.in:FROM MKRAW_TAG as raw
./pkg/eve/Dockerfile.in:FROM MKCONF_TAG as conf
./pkg/grub/Dockerfile:FROM ${EVE_BUILDER_IMAGE} as grub-build
./pkg/grub/Dockerfile:FROM scratch
./pkg/kvm-tools/Dockerfile:FROM lfedge/eve-alpine:6.7.0 as build
./pkg/kvm-tools/Dockerfile:FROM scratch
./pkg/k3s/Dockerfile:FROM lfedge/eve-alpine:6.7.0 as build
./pkg/k3s/Dockerfile:FROM scratch
./pkg/wlan/Dockerfile:FROM lfedge/eve-alpine:6.7.0 as build
./pkg/wlan/Dockerfile:FROM scratch
./pkg/wwan/Dockerfile:FROM lfedge/eve-alpine:6.7.0 as build
./pkg/wwan/Dockerfile:FROM scratch
./pkg/mkimage-raw-efi/Dockerfile:FROM ${EVE_BUILDER_IMAGE} AS build
./pkg/mkimage-raw-efi/Dockerfile:FROM scratch
./pkg/kernel/Dockerfile:FROM lfedge/eve-alpine:6.7.0 AS kernel-build
./pkg/kernel/Dockerfile:FROM scratch
./pkg/acrn/Dockerfile:FROM lfedge/eve-alpine:6.7.0 as kernel-build
./pkg/acrn/Dockerfile:FROM scratch
./pkg/dnsmasq/Dockerfile:FROM lfedge/eve-alpine:6.7.0 as build
./pkg/dnsmasq/Dockerfile:FROM scratch
./pkg/debug/Dockerfile:FROM ${EVE_BUILDER_IMAGE} as build
./pkg/debug/Dockerfile:FROM scratch
./pkg/guacd/Dockerfile:FROM lfedge/eve-alpine:6.7.0 as build
./pkg/guacd/Dockerfile:FROM scratch

Next, I excluded any that will always or never come from a registry, primarily upstream alpine, golang, unikernel, scratch, simplifying it further:

find . -name 'Dockerfile*' -not -path './.go/*' -a -not -path '*/vendor/*' -exec grep --with-filename FROM {} \; | grep -v -E '(scratch| alpine | alpine:|golang|unikernel\/)'

The results are down to 45 files.

Now we just need to see the value of FROM and how many times it is used:

find . -name 'Dockerfile*' -not -path './.go/*' -a -not -path '*/vendor/*' -exec grep --with-filename FROM {} \; | grep -v -E '(scratch| alpine | alpine:|golang|unikernel\/)'  | awk '{print $2}' | sort | uniq -c | sort  -r

the results:

  20 lfedge/eve-alpine:6.7.0
  15 ${EVE_BUILDER_IMAGE}
   1 lfedge/eve-xen-tools:5f5a4015f2e392b5e8afb76fbc3019d17425e029
   1 lfedge/eve-uefi:6be7d8ce5c058a39cb0b120f84e2e508e126c24a
   1 lfedge/eve-strongswan:5b322e95477774eca6ecf2fbe10945b56ff5310b
   1 lfedge/eve-gpt-tools:11cd917db4f3b5d12bec731926af2832d52f5362
   1 lfedge/eve-dnsmasq:cc2426e0f51538f60e82c7ffe26e6a857fdc2483
   1 lfedge/eve-alpine:3a7658b4168bcf40dfbcb15fbae8979d81efb6f1
   1 MKRAW_TAG
   1 MKISO_TAG
   1 MKCONF_TAG
   1 IPXE_TAG

The EVE_BUILDER_IMAGE is set to lfedge/eve-alpine:6.7.0 in all cases, it just uses it as a build arg via ARG EVE_BUILDER_IMAGE=lfedge/eve-alpine:6.7.0. So we can redo the results as:

  35 lfedge/eve-alpine:6.7.0
   1 lfedge/eve-xen-tools:5f5a4015f2e392b5e8afb76fbc3019d17425e029
   1 lfedge/eve-uefi:6be7d8ce5c058a39cb0b120f84e2e508e126c24a
   1 lfedge/eve-strongswan:5b322e95477774eca6ecf2fbe10945b56ff5310b
   1 lfedge/eve-gpt-tools:11cd917db4f3b5d12bec731926af2832d52f5362
   1 lfedge/eve-dnsmasq:cc2426e0f51538f60e82c7ffe26e6a857fdc2483
   1 lfedge/eve-alpine:3a7658b4168bcf40dfbcb15fbae8979d81efb6f1
   1 MKRAW_TAG
   1 MKISO_TAG
   1 MKCONF_TAG
   1 IPXE_TAG

The problem arises when we want to build any one of those - primarily lfedge/eve-alpine - locally, and check its value by using it downstream. For example, if we make changes to lfedge/eve-alpine, call it lfedge/eve-alpine:6.8.0-test, and want to use it in other pkgs, the build would fail, as it is not pushed out to docker hub, nor should it be.

In cases where we are just building the downstream packages that consume a well-known and published lfedge/eve-alpine, it will not be an issue, as it is downstream.

The same is true for the interpolated value of MKISO_TAG and MKCONF_TAG, etc. If we are building just the downstream packages, then the current value interpolated will already have been published. But if we are building the source of MKISO, and it has not been published yet, then the downstream build will fail.

potential solutions

This section is for suggestions as to how to fix it.

Some starting points, none of which is necessarily good.

Always publish. This might work for CI - even if it is bad practice - but is not realistic for local development.
Docker cache. This doesn't work for the reasons listed above: it cannot be consumed by containerized buildkit.
Run a local registry. This is not too hard to do, but then we have the issue of the image name. We would need to call it localhost:8800/lf-edge/eve-alpine:7.6.0 or similar in the dockerfile. This might work if we set all of them to use build args, but it is messy and requires a lot of extra overhead. It also is not clear that linuxkit supports build-args (although those could be added).
Buildkit support image caching. I have been trying to get them to do this for ¾ of a year, getting feature parity between docker driver and container driver. The engineers are adamant that it won't happen (the managers say that it will, but likely 6-12 months out). Yes, we have offered to assist, but there is a lot of underlying architecture issues they are solving first.
use buildkit build named contexts. This allows you to "alias" one image name in a Dockerfile with one from another source, one of: networked registry, git repo, local path. It is a more advanced and transparent version of using build args. This might work, if we can figure out how to point it somewhere that it knows how to consume, and how to come up with a sane UX that is usable and understandable.
use buildkit bake additional contexts, which is just a slightly more advanced and flexible version of the named contexts above, but requires special bake files.

Looking forward to insights.

I can't understand how named contexts or bake can help. As mentioned in the Problem Description

buildkit with container-driver only supports loading FROM images via networked registries, i.e. FROM eve/foo:1.2.3 will only look in dockerhub from eve/foo:1.2.3. It does not look in the local cache, whether linuxkit's or docker's.

From my understanding whatever alias we have for eve/foo:1.2.3 linuxkit will still look in the docker hub for it.

I guess we can alias the name to something existing in the docker hub, but that is pointless, because the whole purpose of this efforts is to use something we build localy.

I am going to try and lay out the build issue with a practical example, and then how each potential solution impacts it.

Let's say we are a local developer or CI. We are making changes to lfedge/eve-uefi, on which the downstream lfedge/eve-xen-tools depends. So our Dockerfiles and build process look something like this:

# first build lfedge/eve-uefi:
$ linuxkit pkg build pkg/uefi
# if the current git tree hash for pkg/uefi is, e.g. 1234567abcd, then
# the above command will create an image - STORED LOCALLY - called "lfedge/eve-uefi:1234567abcd"

# next build lfedge/eve-xen-tools
$ linuxkit pkg build pkg/xen-tools
# if the current git tree hash for pkg/xen-tools is ffbb6677, then the above command will
# create an image - STORED LOCALLY - called "lfedge/eve-xen-tools:ffbb6677"

The file pkg/xen-tools/Dockerfile has the line (generated by builds):

FROM lfedge/eve-uefi:1234567abcd

For the second build to work, the builder process must be able to find lfedge/eve-uefi:1234567abcd, which is the output of the first.

We recall that lfedge/eve-uefi:1234567abcd is the result of the first build. It never was pushed to a registry, and only exists locally.

The name lfedge/eve-uefi:1234567abcd actually means, fully parsed, docker.io/lfedge/eve-uefi:1234567abcd, i.e. that image on the (implicit) docker hub.

Because we are in a build process, actually testing changes, that image lfedge/eve-uefi:1234567abcd cannot be pushed until we are done fully testing; it might never get pushed, if CI has issues and we changed the files to fix it.

Looking at the potential solutions.

Always Publish

That interim, might be temporary, image, lfedge/eve-uefi:1234567abcd actually gets pushed to Docker Hub. This would ensure it always is available, but would pollute Docker Hub with lots of images, many of which were failures, risking someone actually using it.

In addition, this only would work for CI, which has credentials to push to docker.io/lfedge/*. For developers working locally, who do not (and should not) have such credentials, it will not work.

Docker cache

In a "normal" docker build, the line:

FROM lfedge/eve-uefi:1234567abcd

is interpreted as: "first look for lfedge/eve-uefi:1234567abcd in my local image cache, and only go to the registry if I cannot find it."

This is useful behaviour for builds, and nearly every docker-based process depends upon it for the same reasons we do: build locally, FROM locally, and only push when you are sure everything is good.

Unfortunately, this does not work with container-driver buildkit, which is the source of all of our problems. There is no way to tell container-driver buildkit, "go look in this cache before going to a registry".

Local Registry

In this case, even though the image name in the line is:

FROM lfedge/eve-uefi:1234567abcd

which means, "use the image lfedge/eve-uefi:1234567abcd from Docker Hub", we use a local registry to store the image.

For this to work, the builder would need to know, "do not go to docker hub, but rather go to some local registry at localhost:8800 (or wherever)". The way to do that is to use build args in the Dockerfile:

ARG uefi=lfedge/eve-uefi:1234567abcd
FROM ${uefi}

A normal build of docker build would just go to docker hub, but if you run docker build --build-arg uefi=localhost:8800/lfedge/eve-uefi:1234567abcd, it will go get a different image.

The challenges are:

linuxkit needs to support build args (we can get that to happen)
we need to get the first image loaded into that local registry
we need to change any dockerfile whose FROM depends on another file that might change in development process

This solution should work, if somewhat messy.

buildkit image caching

I have wanted this for nearly a year. The same way docker has, "I first will look in my local cache, and only go registry if I cannot find it", I want buildkit to support the same thing. This is the best solution, both technically and operationally, but we are waging an uphill battle to make it happen.

buildkit named contexts

Named contexts essentially let you alias an image on a FROM line with a different source. So if the dockerfile is:

FROM lfedge/eve-uefi:1234567abcd

we could run docker build --build-context lfedge/eve-uefi:1234567abcd=<some other source>. In this case, the builder would "override" getting lfedge/eve-uefi:1234567abcd from docker hub, and instead get it from <some other source>.

That some other source could be:

a different docker image (e.g. replace docker.io/lfedge/eve-uefi:1234567abcd with quay.io/foo/other-image:tester)
a git repository
a local directory

For this to work, we would need:

the builder to know what we are replacing. The actual build command needs to know precisely what images we are replacing. Whereas before, it was all inside the dockerfile, here, the build command itself needs to know, "alias this image to that source". This requires some real work around our build process, which previously just delegated it all and said, "use local build cache." The simple process of "build A, cache it, build B where B's Dockerfile depends on output of A" would need some change to: "build A, cache it, then build B where the actual build A process knows what images to alias in addition to B's Dockerfile".
someplace to store that output that is compatible with buildkit's supported build contexts. That might be a local registry (which means we need to push it to there), or a local directory (which means we need to export and expand it).

buildkit bake contexts

These are just fancier versions of buildkit build contexts. Rather than specify it on the command-line with --build-context, you put it in a bake file, which contains the aliasing.

For example, we would do:

target "xen-tools" {
    contexts = {
       "lfedge/eve-uefi:1234567abcd"  = "<some other source>"
    }
}

This then becomes the same question: where is <some other source> and how do we get the image there?

bake-file is still required to publish to the registry or cache. This is a minimal example and commands: https://github.com/ruslan-zededa/lkt-try

You don't need a bakefile publish. buildx build supports all sorts of output formats.

Someone had a good idea on using bake files, but only when dealing with local cache. I might try that, and if it works, get it into linuxkit.

It still becomes a headache. You would need to know at building the first image that you should "extra cache" it, and at the second image that you should source from that cache. What does that CLI look like?

lkt pkg build A --cache-local
lkt pkg build A --image-from A cache-local

Something like that?

This is ugly.

I may have a better solution. Still not perfect, but it looks better.

Recall the issue:

I build A, lkt uses container-driver buildkit, and caches the result in its own OCI layout formatted cache
I build B, lkt users container-driver buildkit, which looks for A in registry, does not find it, errors out

As described above, buildkit has the ability to use build contexts (using build or bake) to "alias" images to other sources, including registries, local directories, git repos, etc.

If we add oci layout as a place to get it - which they have agreed with - then we could tell it, "go find A in my local OCI layout cache".

The process then would look like:

lkt pkg build A                                 # unchanged
lkt pkg build B --from-cache A     # last flag added, meaning, "got get A from my linuxkit cache"

I am working on it. Once I fully understand the way to add a source, I will submit a PR to buildkit. It is a bit (a lot) challenging to grasp.

lkt pkg build A # unchanged lkt pkg build B --from-cache A # last flag added, meaning, "got get A from my linuxkit cache"

@deitch Can there be fallback so that we don't need to track whether A is in the build cache i.e., have it look in the cache first and if not found also look in registry?

I am +1 on @eriknordmark suggestion to the last @deitch idea -- @deitch what would it take to implement this in lkt?

Yeah, I think it can.

To recap:

I already am working on extending buildkit so it can support reading from OCI layout. Pre-approval is in place, so once it works, it will be in.
Once that is in, I will do the above in linuxkit.
Then we can work on it with eve-os build

Great @deitch ! Btw, what about cross builds in buildkit -- with what you're working on -- will we have that for free or is it still separate work?

Depends what you mean by cross builds. If you mean, "build for amd64 whole on arm64 using emulation", then, yeah, lkt already does it. It also can inherit remote builders for quicker builds. We just don't take advantage of it because of that issue. This will inherit all of that.

If you mean literal cross compiling, as in native go on amd64 compiling for arm64, it doesn't do that yet. I could get us there, but it is a lift.

Depends what you mean by cross builds. If you mean, "build for amd64 whole on arm64 using emulation", then, yeah, lkt already does it. It also can inherit remote builders for quicker builds. We just don't take advantage of it because of that issue. This will inherit all of that.

Yeah -- that's what I'm after -- can you please keep that usecase in mind while you're working on this issue? I can help test/etc. too -- really hope we can restructure our Dockerfiles/lkt/buildx setup in such a way that it would simply work out of the box. If not -- we may need to file additional issues I guess -- but that that kind of investigation is what I'm after.

The first buildkit PR is merged in!!

The first linuxkit PR is open and CI is green. More coming.

Linuxkit PR is merged in. Linuxkit now executes buildkit directly.

One more PR for linuxkit to support the cached build, and it is done.

lf-edge / eve