kubernetes-sigs / image-builder

Tools for building Kubernetes disk images
https://image-builder.sigs.k8s.io/
Apache License 2.0
402 stars 394 forks source link

Golang implementation of image-builder with buildkit backend #1633

Open rajaskakodkar opened 5 hours ago

rajaskakodkar commented 5 hours ago

Is your feature request related to a problem? Please describe.

Currently, there are bunch of issues with the build and test of image-builder

  1. There is no clear demarkation between OS level tasks and Kubernetes related tasks making it a tightly coupled system where a change in OS config warrants a rebuild of the entire system and not just OS layer and vice versa for Kubernetes.
  2. Ansible configurations have grown exponentially making the project extremely flexible and configurable but also difficult to maintain
  3. Packer continues to be a build time dependency which has been pinned to a version because of licensing issues. Ref https://github.com/kubernetes-sigs/image-builder/issues/1246
  4. While Goss provides some validation for the artifacts of the project, there is still a gap of "e2e test coverage" in the form of inspect tests which can actually provide a clear signal of the final state of the machine images. Also highlighted in https://github.com/kubernetes-sigs/image-builder/issues/1605

Describe the solution you'd like

One liner pitch: A Golang implementation of image builder with buildkit at its backend providing OCI layers for OS and Kubernetes related actions.

Slightly more detailed version: The idea is to transform the build system of image builder to start with

  1. Create a raw disk from scratch with loop devices, etc mounted on it
  2. Superimpose the distro provider filesystem on the disk. 2.1. To create this filesystem, start with an OCI Image of the OS and implement all the tasks done by image-builder at an OS level in golang with buildkit llb at its backend https://github.com/moby/buildkit/blob/c1dacbc5ce0544ff72f7dc8acd9b99f015c2021a/docs/dev/dockerfile-llb.md
  3. Independently create an OCI Image with all the Kubernetes and friends (containerd, etc) related tasks and curate the filesystem as expected by Kubernetes and Cluster API
  4. Superimpose 2 and 3 to provide the combined filesystem for the machine image
  5. Mount the filesystem on the raw disk created in 1
  6. Use independent tools like openvmdk, vhd, ami, etc to create machine images for various providers

This enables clear demarkation between the OS and the Kubernetes layers as well as moves to a golang implementation. This can then be extended to write an e2e suite with inspect tests for automated testing.

This also moves away from packer.

This is distro agnostic and can work for windows as well.

Describe alternatives you've considered

systemd-sysext https://man.archlinux.org/man/systemd-sysext.8.en has appeared in the community and maybe there is a middle ground here to figure out how to integrate the OCI approach with systemd-sysext. Comments welcome!

A caveat here is that systemd-sysext will work only on systemd distros and not on windows

Another alternative is bootc for linux distros - https://github.com/containers/bootc

Additional context

My partners in crime in brainstorming this have been @randomvariable and @clebs and we want to check the appetite of the community for this feature. Happy to contribute to this effort!

cc @AverageMarcus @mboersma


/kind feature

rajaskakodkar commented 5 hours ago

cc @t-lo to see how systemd-sysext fits here

AverageMarcus commented 5 hours ago

Some quick thoughts...

  1. Have you PoC'd this approach? I did a little research after our chat at KubeCon and couldn't find anything reliable on how to convert an OCI image to a VM disk image. Things like Kernel aren't included in OCI images usually and needs some special attention.

  2. You mention Independently create an OCI Image with all the Kubernetes and friends (containerd, etc) related tasks and curate the filesystem as expected by Kubernetes and Cluster API to have a separation between OS and non-OS things but I'm not sure this is actually feasible in practice. Please correct me if I'm wrong but I'm pretty sure different OS's expect things to be in different locations - e.g. Flatcar vs. Debian vs. Windows. I'm not sure we could get away we creating a generic layer that will work for all distros.

  3. I'd like to hear from the Flatcar folks that have been working on systext (thanks for pinging Thilo) and get their thoughts on how this could integrate.

  4. Big +1 on improving testability! 💙 We're sorely lacking in this area in the project and it's hit us a few times now.

  5. I like the thought of "reigning in" the project a bit. The current amount of configurability is not maintainable and makes all support issues difficult. It would be good to have a moderately strict amount of configuration that we support with this new approach but I also know that people have a need for more so we'd need to think of some "hook" that people could use to layer on their additional, unsupported configurations.

  6. I like the idea of having things more modular. We have a section that is base-OS, a section that Kubernetes binary layer and another section that is "convert to specific cloud provider". The benefit of this is we could have more maintainers of each of these specific areas. E.g. someone from Azure that is just responsible for the "convert to Azure disk image" bit. That way us maintainers don't have to be experts on everything (because thats impossible 😅)

t-lo commented 4 hours ago

@rajaskakodkar Thanks for pulling me in! Tl;dr, sysexts would blend well with just about any image building approach you chose.

We would preferably compose sysexts into the system at provisioning time. This way, we can provision stock distro images instead of being required to host our own, self-built images. However, we absolutely recognise use cases where pre-built images are preferred, and the additional overhead of self-hosting images is acceptable. In these cases, using sysexts would be as easy as adding 2 or 3 files to well-known locations in the disk image. This should be very straightforward and does not require any exotic actions like e.g. running tools from inside the disk image (as it's currently the case with image-builder). Given the flexibility of sysexts, this could either be done at step 2 or step 3/4 of your process - without losing the ability of updating the Kubernetes bits independently.

Re: using OCI images, @AverageMarcus raises a good point - you need to figure out the boot process. There's some prior work for that in the bootc project, which specialises in bootable containers, and requires (to my understanding) special-crafted container images with kernel and bootloader: https://containers.github.io/bootc/intro.html.

Re: Marcus' 2nd point, there are minor differences but for Kubernetes specifically, these are negligible - Kubernetes already does a great job at being self-contained. Providers would need to adjust service management (likely systemd on most systems) and maybe logging. I'm not sure about Windows though.

I also wouldn't under-estimate vendor support - providing images for cloud vendors will need integration work of the base os, as well as continued testing. This too is a reason we prefer provisioning-time composition as we can directly benefit from the upstream distros' integration work for various clouds :)

Lastly, and just out of curiosity, did you have a look at mkosi? It seems to be the go-to tool for building distro images nowadays: https://github.com/systemd/mkosi/ and I know of at least one Kubernetes distro - Edgeless' "Constellation" - that uses mkosi and is happy with it. It currently doesn't support Windows though, and I personally don't have much experience with it (the tooling we use in Flatcar to build vendor images predates mkosi by several years, and we didn't investigate integrating mkosi with Flatcar yet).