rpm-software-management / rpm

The RPM package manager
http://rpm.org
Other
497 stars 359 forks source link

RFE: Standardize on OCI images for test-suite, even locally #2643

Closed dmnks closed 10 months ago

dmnks commented 1 year ago

Background

The rpmtests script, once built, is designed to be run as root and exercises the RPM installation in the root filesystem. When an individual test requires write access to the root filesystem (e.g. to install some packages), a writable snapshot is created with OverlayFS and a lightweight, fast container is spawned with Bubblewrap on top of it, running RPM or some other tool. This is to prevent the individual tests from affecting each other. All this logic is wrapped in the shell function snapshot() in atlocal.in.

When hacking on RPM, it's typically not desired to install the build artifacts into the native root filesystem, though, so make check wraps the rpmtests script in a container of its own. This container runs on top of an OS image that mirrors the host and contains only the necessary runtime and test dependencies, in the versions matching the development headers used during the build. RPM is make install-ed on top of that filesystem to produce the final image. The parent container with rpmtests is typically spawned using the snapshot functionality built into the test-suite, with the lower layer being the image directory instead of /.

To build such an image, a host-native mktree script ("backend") is invoked by make check. It is supposed to use whatever the native package management tooling on that platform (Linux distribution) is, e.g. dnf --installroot on Fedora or the zypper equivalent on OpenSUSE. This is ideally done as an unprivileged user, with the use of Linux namespaces(7).

Currently, we only include mktree.fedora but the idea was to gradually add more such backends for the platforms where RPM is typically built and developed, at least for OpenSUSE and perhaps (experimentally) for Debian/Ubuntu too.

Problem

It turns out that, while this is quite easy to do on Fedora with DNF and unshare(1), it is not as easy on other distros that I've tried, namely OpenSUSE where Zypper doesn't seem to like being run through unshare(1), and would likely require sudo instead. The same I've observed with Debian and debootstrap. While not a dealbreaker per se, we should really try to avoid sudo for something as trivial as a make check.

Another drawback of this approach is that we suddenly find ourselves in the business of maintaining distro-specific scripts where each needs its own set of tricks and workarounds, such as having to inject a bunch of RPM macros to DNF to make it behave properly for our purposes. This does not scale well and just makes our life harder in the long run as the packaging stacks evolve.

In fact, mkosi does almost what we need as it abstracts all these distro-specific details away from the user. In the latest version, it even works without root privileges and apparently can now run the build script natively, meaning that the local build directory produced by CMake could possibly be used.

However, there are still some other limitations preventing us from considering mkosi, such as the inability to run within a container, thus making it less portable. That is something we need for our CI environment where we typically have a limited choice of operating systems (Ubuntu 22.04 LTS in GitHub Actions currently) and thus may or may not have all the build and runtime dependencies for the latest development snapshot of RPM available in the official distribution repositories.

Also, the core philosophy of mkosi is more like "wrap the build system in a container and act as the primary interface for building/testing" whereas we'd prefer it the other way around, i.e. "seamlessly integrate into our existing build system and reuse the local build artifacts".

Hence, mktree was born, as a poor-man's version of mkosi tailored to our use case, almost dependency free, with some shamelessly stolen ideas and naming conventions from mkosi, but having the issues mentioned.

Solution

As it happens, there already is a standardized way of distributing container images, namely OCI, or more commonly known as Podman and Docker images. The public registries contain a lot of different Linux distributions, certainly those that we care about. And most developer workstations likely have Podman or Docker installed already.

In fact, we already use these through mktree.podman and our Dockerfile. This backend currently acts as a fallback for non-Fedora distros and is our go-to backend in the CI. As opposed to mktree.fedora, mktree.podman uses Podman/Docker to also spawn the parent container from the image. This is obviously the most natural way once you're already using Podman/Docker to build the image.

The only downside of these premade images is that, in the case of Fedora or OpenSUSE, they, well, contain a stock installation of RPM already, and we would like to get rid of it before planting our own. Which is what we do in our Dockerfile, by simply self-destructing RPM with rpm --nodeps -e rpm .... This is ugly as it creates an additional layer internally in Podman/Docker, but hey, it works. In the future, we could always publish our own, pristine OCI image for RPM testing (basically the one that we currently build with mktree.fedora), but that's really out of scope here :laughing:

So, why not just make mktree.podman the default backend everywhere and drop mktree.fedora? Well, it's currently not optimized for iterative make check use since it doesn't reuse the local build directory (instead, it builds RPM itself as part of the Dockerfile and thus in a container, much like mkosi). That means, you get a new CMake build from scratch on each make check, and the previous layer is not even cleaned up properly which results in the layer cache growing in size over time (yikes). It also prevents us from "hot swapping" the RPM installation due to the nature of OCI layering, something that's useful in make shell where you don't want your changes in the container to be dropped whenever rebuilding RPM (something that mktree.fedora supports because it manages the layers as plain directories).

Conveniently, Podman allows for mounting an image into a directory on the host, which means we can simply use that directory instead of bootstrapping our own with DNF/Zypper/etc. Thus, we could reuse the layering in mktree.fedora for the rest.

Docker sadly doesn't support mounting images, but it does support extracting them into a destination directory, so we can fall back to that for Docker, with the small CPU and disk space penalty involved (it's still much faster than using a package manager to download the metadata, then the packages and finally install them).

Summary

So, to summarize, the goal of this ticket is to merge mktree.fedora and mktree.podman such that a host-specific Dockerfile.<OSNAME> is used to build the image, instead of a mktree.<OSNAME> script that does that from scratch. It also has to support reusing the local build artifacts for local development.

This way, we'll delegate the image creation process to the appropriate tooling where it really belongs. We'll only need to keep a (bunch of) Dockerfiles which are well-defined and easy to maintain.

This change isn't targeting 4.19 since it's too late for that and also it's not something that's relevant in the distributed tarball, it's more like a feature/optimization for the developers working off of a git checkout, as well as a simplification of the whole test-suite architecture.

pmatilai commented 1 year ago

So, to summarize, the goal of this ticket is to merge mktree.fedora and mktree.podman such that a host-specific Dockerfile. is used to build the image, instead of a mktree. script that does that from scratch. It also has to support reusing the local build artifacts for local development.

This way, we'll delegate the image creation process to the appropriate tooling where it really belongs. We'll only need to keep a (bunch of) Dockerfiles which are well-defined and easy to maintain.

:+1: to this from a high-level viewpoint, the Dockerfile is widely understood, lets not reinvent a wheel we don't have to. The other important point here is to avoid tests behaving differently locally and in CI (AIUI CI didn't need the dbus-libs tweak in 4712c9c37bbf28f56f1e386df788dac440cc4cb8 but local usage does, and that's unwanted double-maintenance if nothing else)

dmnks commented 1 year ago

Yup, just the fact that we currently have two distinct ways to create a test image is awkward and just attracts all kinds of bugs :laughing:

pmatilai commented 1 year ago

Yep, to me that'd be among the top issues to address.

dmnks commented 1 year ago

Ack, it's a confirmed bug now and will be dealt with like any other.

dmnks commented 1 year ago

I've come to realize that, by losing host-specific mktree scripts (e.g. mktree.fedora), we would also lose a couple of nice features along with it, namely:

1) Simple software management in the test tree

The recently added make env target runs a shell on the host which mounts a test root at $RPMTEST that can be used in the same way as in a normal test, i.e. to run RPM in it with runroot rpm and to inspect/modify the filesystem with the host tooling.

This, quite conveniently, also applies to the software installation in the test tree. One can now use the host native package manager to extend the tree when doing interactive testing with make env. In fact, on Fedora, make env defines an alias which basically amounts to dnf --installroot=$RPMTEST to make this process easier. Naturally, the full command is identical to the one needed to bootstrap the tree in the first place.

By contrast, the Dockerfile native way is flipped inside out. It assumes that software is installed from within the base image which ships with the native package manager from the start. However, that's not compatible with our needs since we're actually developing a component in the very same packaging stack.

Therefore, separating the management of the test tree from the actual testing makes more sense. It's also why the test logic has always run outside of the test tree, not inside of it.

So, by going full Podman/Docker for image management, we would lose the host-specific wrappers (dnf --installroot=$RPMTEST ... on Fedora) in make env. Or, we could keep the wrappers, but at that point, we might as well reuse those wrappers to actually setup the tree and avoid the extra Podman step, putting us back to where we are now :smile:

2) Full control over the test tree

By bootstrapping the tree from scratch, we can control what software goes into it without resorting to a dnf remove or rpm -e --nodeps in an equivalent Dockerfile. Most notably, as mentioned in the description above, this pertains to the RPM installation itself which we need to purge before installing the local one.

Furthermore, if we were to go even further and use Podman/Docker to also spawn the wrapping containers (i.e. mktree.podman being the default), we would also lose this in local development:

3) Ability to swap out the RPM installation in a container

This is currently very useful in make shell and make env as it avoids the need to throw away your changes to the container each time you make a source change in RPM.

By using Podman/Docker to spawn the container, we would have to either make install the new artifacts over the ones already present in the container or rebuild the image, thus having to recreate the container too.

While the former (redo make install) may not be a problem in most cases, it certainly isn't as clean as simply recreating the installation directory from scratch, which is what we currently do in mktree.fedora.

The latter (recreate the container on each change) would seriously hamper any useful container-based development.

dmnks commented 1 year ago

The other important point here is to avoid tests behaving differently locally and in CI (AIUI CI didn't need the dbus-libs tweak in 4712c9c but local usage does, and that's unwanted double-maintenance if nothing else)

Actually, adding dbus-libs to the Dockerfile wasn't needed because it already contains it :smile: It gets pulled in as a dependency of dbus-devel which we install there in order to build RPM (it's not needed in mktree.fedora obviously).

dmnks commented 1 year ago

And yes, that's still double-maintenance in a sense, but it's due to the fact that we do not need build deps of RPM in the local testing scenario, as those are installed on the host (RPM is built on the host). So, not really much of a problem.

dmnks commented 1 year ago

One way to fix that would be to simply build and test with the mktree.fedora backend even in the CI, but from a Fedora container, due to the build deps. The downside of that would be increased execution time in CI since we would have to 1) build the Podman image for building RPM and then 2) setup the test tree from scratch with DNF, just like on the host.

That's basically the reason for the existence of mktree.podman which reuses the build image also for running the tests. Basically, it's an optimization for CI, nothing else. It's not suitable for local development, due to the reasons outlined in my previous comments, though.

dmnks commented 1 year ago

To summarize again, these are our options to resolve the ticket:

1) No change, keep the native & podman backends

Image building:

Container layering:

Pros:

Cons:

2) Use an OCI base image

Image building:

Container layering:

Pros:

Cons:

3) Use OCI for full image & container management

Image building:

Container layering:

Pros:

Cons:

dmnks commented 1 year ago

In terms of usefulness/viability, the worst is option 3, followed by option 2 and then 1.

Basically, the biggest drawbacks of 2) and 3) are in the area of local development with make shell and make env. These are new features that weren't there before the recent test-suite rework, so making them worse wouldn't be the end of the world.

However, these features also are some of the niceties that the test-suite rework has allowed for, and throwing them away would be a pity.

dmnks commented 1 year ago

Problem

It turns out that, while this is quite easy to do on Fedora with DNF and unshare(1), it is not as easy on other distros that I've tried, namely OpenSUSE where Zypper doesn't seem to like being run through unshare(1), and would likely require sudo instead

Hmm, not sure what I did wrong when trying this out originally, but now I've just tried again by doing an unshare -rm --mapauto zypper --root ... and it seemed to work just fine and installed the filesystem into the specified directory.

dmnks commented 1 year ago

Ah, OK, I think this is the error I saw initially (and now too):

error: unpacking of archive failed on file /dev: cpio: chown failed - Device or resource busy
error: filesystem-84.87-12.1.i586: install failed
( 4/27) Installing: filesystem-84.87-12.1.i586 .........................................................................[error]
Installation of filesystem-84.87-12.1.i586 failed:
Error: Subprocess failed. Error: RPM failed: Command exited with status 1.

It's still possible to select "ignore" in the prompt and continue without any further errors, though.

dmnks commented 1 year ago

Anyway. Having slept on this a couple of times, it's become clear that we just don't want to deal with all this bootstrapping business and basically re-implement mkosi (which we can't use for other reasons as outlined above).

So, circling back to where we started, all we really need is outsource the base image creation to OCI and be done with it. That means, option 2 from the comment above is what I'm going to implement.

Thanks for your attention :laughing:

pmatilai commented 1 year ago

Actually, adding dbus-libs to the Dockerfile wasn't needed because it already contains it šŸ˜„ It gets pulled in as a dependency of dbus-devel which we install there in order to build RPM (it's not needed in mktree.fedora obviously).

Oh but it wasn't added to the Dockerfile but mktree.fedora, where on that third (?) level of inception dbus-devel is not installed because that's just rpm's own build-dependency whereas mktree.fedora prepares an environment for just the test-suite. :smile:

One could also look at that from the point of separating rpm's build-requires and test-requires, but it doesn't seem meaningful. We'll want to be able to do stuff like use cmake to compile something against rpm in the test-suite (but that's yet another story). The test-environment equaling the build-environment seems the path of least maintenance. As with the current CI environment, AIUI. OTOH I could also be utterly confused :zany_face: :smile:

Edit: I remember now (after re-reading some of the commentary in this ticket): the mktree.fedora case was needed to allow testing the locally built binary against that distro version, whereas the CI case is different. That was how and why it was so special. I'll just stock on some popcorn and sit back to see what happens in this space :grin:

dmnks commented 1 year ago

Oh but it wasn't added to the Dockerfile but mktree.fedora, where on that third (?) level of inception dbus-devel is not installed because that's just rpm's own build-dependency whereas mktree.fedora prepares an environment for just the test-suite. šŸ˜„

Yup, I can see how this is confusing, and that's one of the reasons I want to simplify this, by only having one way of creating the image.

Nevertheless, the idea of a mktree.distro script was to bootstrap an image with only the runtime and test deps, since the build deps are irrelevant to the test-suite. That's why you needed to add dbus-libs explicitly to the DNF command line there.

The Dockerfile (and mktree.podman) only existed as a universally portable variant of the above which also builds RPM in a container. And, as an optimization, it reuses the same image for the testing too, instead of setting up a separate tree with mktree.fedora afterwards, because the Dockerfile is based on Fedora and thus already contains the runtime deps for RPM :smile:

One could also look at that from the point of separating rpm's build-requires and test-requires, but it doesn't seem meaningful. We'll want to be able to do stuff like use cmake to compile something against rpm in the test-suite (but that's yet another story). The test-environment equaling the build-environment seems the path of least maintenance. As with the current CI environment, AIUI. OTOH I could also be utterly confused šŸ¤Ŗ šŸ˜„

Confused perhaps (can't blame you, lol), but also on point. Indeed, our goal is to equal the build and test environment in terms of the underlying libraries being ABI compatible (i.e. you build for library 1.X.Y and you run with library 1.X.Y), but not necessarily in terms of the actual root filesystem.

The simplest way to do this is sudo make install and then sudo ./rpmtests. But we don't want to force the poor developer to dump their RPM snapshot into their workstation system (and to risk getting their files purged on a forgotten "rm -rf" somewhere in the test-suite) so we separate the test environment through containerization :smile:

Slightly off-topic:

One way to avoid the image management altogether would be to simply reuse the root filesystem on the host as the lowerdir in OverlayFS terms (and layer an RPM install on top). However, for that, one needs real root privileges. Plus, we wouldn't have full control over that test environment since there could be random stuff installed on the host and possibly interfere with the tests and/or cause weird failures.

Another way would be to reflink the necessary libraries from the host. That would be limited by the filesystem in use, e.g. btrfs or xfs support that but ext doesn't. Also, we would have to somehow obtain the files to reflink, which could possibly be done with something like rpm -ql libfoo, however that still would not guarantee a 1:1 copy of the given package since there are scriptlets which wouldn't be run. If we added a feature to RPM to "clone" or "rebase" a package to a different root filesystem, that could be used :smile: But that's really not something to worry about now.

EDIT: Of course, we could always fall back to a plain cp on non-reflinkable filesystems, which in my earlier experiments is still faster than DNF/Dockerfile builds. But again, it's not as straightforward so I didn't explore that further :smile:

dmnks commented 1 year ago

We'll want to be able to do stuff like use cmake to compile something against rpm in the test-suite (but that's yet another story).

Oh, this is actually a very good point. It's not something I had in mind previously, but it's yet another reason to just reuse the same Dockerfile image for both building and testing, indeed :smile: (which is in line with the planned solution of this ticket).

Edit: I remember now (after re-reading some of the commentary in this ticket): the mktree.fedora case was needed to allow testing the locally built binary against that distro version, whereas the CI case is different. That was how and why it was so special. I'll just stock on some popcorn and sit back to see what happens in this space šŸ˜

Ack, yes, that's the thing. We want both portability and integration with a local build for native development.

And yup, popcorn is in order, indeed :laughing:

dmnks commented 1 year ago

Oh, Ubuntu 22.04 LTS (our lowest common denominator) ships Podman in the official repos. Nice. We can then drop Docker support altogether and just use the same backend for local and CI purposes. Most likely (still need to verify that the Podman version in Ubuntu works the same way).

pmatilai commented 1 year ago

Oh, that's a nice bonus. Assuming of course it actually works :smile:

Conan-Kudo commented 1 year ago

Keep in mind, we need a way to run it directly on the host, because all this fanciness you're talking about doesn't exist on non-Linux platforms. In particular, I would like to be able to run the test suite for RPM on macOS still. šŸ˜…

Conan-Kudo commented 1 year ago

I will also point out there are openSUSE containers that use DNF too. šŸ˜‰

dmnks commented 1 year ago

Keep in mind, we need a way to run it directly on the host, because all this fanciness you're talking about doesn't exist on non-Linux platforms. In particular, I would like to be able to run the test suite for RPM on macOS still. šŸ˜…

In essence, what the test-suite needs is a filesystem tree that contains an installation of RPM to test and its runtime dependencies (i.e. a make install as root in a development VM would do just fine), and a way to quickly make writable copies (e.g. copy-on-write snapshots) of that tree. Then, a plain chroot into those snapshots could be used to isolate the individual tests.

None of this is inherently Linux specific, however to make it reasonably fast and efficient, you need something like OverlayFS which is currently only available on Linux (and reportedly on some BSDs through fuse-overlayfs but that's likely slower than a native kernel implementation).

And then, instead of chroot, we want to (and do) make use of containers in our tests, now that we've freed ourselves from fakechroot. They offer much more possibilities, including proper uid/gid mapping for file ownership testing, unsharing more than just the filesystem namespace, etc.

One way to make the test-suite POSIX compatible would be to identify tests that require a proper container and disable those on platforms that don't have containers. There, chroot could be used instead. But I'm not sure if we really want to go in that direction.

Even then, while chroot is available on most UNIX platforms, OverlayFS-like functionality is not, or varies greatly among the systems (APFS on macOS comes close perhaps) and would need special handling. This really is not something we (the core development team at Red Hat) have the resources and/or expertise to cover. We rely on the community to do this work.

One thing that's not mandatory, though, is rootless operation. We can always just assume that the test-suite is run in a development/throwaway VM. That's one less thing to worry about when it comes to supporting non-Linux systems.

dmnks commented 1 year ago

One more thing to add to the above: This ticket is about using OCI images for the test root creation, on Linux distributions. Any kind of non-Linux work would need to be tracked in a separate ticket.

Conan-Kudo commented 1 year ago

None of this is inherently Linux specific, however to make it reasonably fast and efficient, you need something like OverlayFS which is currently only available on Linux (and https://github.com/rpm-software-management/rpm/pull/2559#issuecomment-1633335921 on some BSDs through fuse-overlayfs but that's likely slower than a native kernel implementation).

FUSE exists on macOS too, though I don't know if fuse-overlayfs works with it.

dmnks commented 1 year ago

I will also point out there are openSUSE containers that use DNF too. šŸ˜‰

Interesting, thanks for sharing. I don't think it changes anything discussed here so far, though :smile:

dmnks commented 1 year ago

None of this is inherently Linux specific, however to make it reasonably fast and efficient, you need something like OverlayFS which is currently only available on Linux (and #2559 (comment) on some BSDs through fuse-overlayfs but that's likely slower than a native kernel implementation).

FUSE exists on macOS too, though I don't know if fuse-overlayfs works with it.

Oh, nice. Also, see: https://github.com/containers/fuse-overlayfs/issues/140

But again, not something I'm going to invest time into. Anyone interested, feel free to investigate and/or submit patches ;)

Conan-Kudo commented 11 months ago

Well, apparently this is now a thing: https://macoscontainers.org/

dmnks commented 11 months ago

Yep, thanks. I noticed this too on Hacker News yesterday and was almost going to post the same here :smile:

DaanDeMeyer commented 6 months ago

@dmnks Next time shout at me about what's missing! I Would have been happy to discuss improvements to mkosi to make it work for your use case. (I would have commented but had no clue you were considering it)

dmnks commented 3 months ago

Oh, I'd missed your comment @DaanDeMeyer, sorry!

Yep, I indeed did consider mkosi initially and even had a branch where I played with it for a couple of weeks. However, later I realized the philosophy of mkosi wasn't completely in line with our requirements (and that's OK!), which were:

  1. Reuse the local, native cmake build directory and install those artifacts into the target image, instead of doing a separate build on the side (like mkosi build does). Basically, we want the test-suite to exercise the user's build directory (that they'd have anyway) rather than keeping an additional one on the side. Looking back, this isn't really super critical and actually has some drawbacks (mentioned in some tickets that I've opened here since then, heh) but that was the "happy path" workflow before moving away from fakechroot so I wanted to keep it as much as possible.

  2. Allow for cross-building images (e.g. build a Fedora image on an Ubuntu host). This is of course the core feature of mkosi, however it requires a reasonably recent OS version (and/or mkosi) on the host, as well as ~a reasonably recent package manager of the OS you're targeting~ the right package selection for your project's dependencies. This is a hard requirement for us because we need to run the test-suite in a CI environment where we can't control the host/VM OS selection. Currently, we use GitHub Actions which only has Ubuntu 22.04 LTS and, IIRC, I had issues with the version of mkosi and the (ancient, unmaintained) RPM stack available there. And building a target image based on Ubuntu wouldn't work for us either because Ubuntu isn't RPM's main target platform and thus doesn't have the latest dependencies.

  3. Rootless image building. I know this is already supported in mkosi for a while but it wasn't when I was considering it (so it was another factor).

So it eventually turned out that using OCI images was the best solution for us. The tooling is ubiquitous and already preinstalled on the Ubuntu VMs in the CI (as well as typical developer systems).

In retrospect, though, I would've talked to you, indeed, if just to understand the whole landscape where mkosi operates better. There certainly are nice features in mkosi that we could use, OTOH right now we're happy with the OCI setup and wouldn't gain much (if anything) by switching.

dmnks commented 3 months ago

... however it requires a reasonably recent OS version (and/or mkosi) on the host, as well as a reasonably recent package manager of the OS you're targeting.

Oops, I got this wrong:

Mkosi doesn't require the target OS's package manager. It uses the native one, whichever it is on the host you're building the image on. The issue I wanted to get across was that the package selection is still tied to the native package manager (e.g. APT on Ubuntu).

dmnks commented 3 months ago

Argh... :facepalm: :smile: re-reading mkosi's man page again, it of course supports multiple Distribution= values... meaning that, if that distro's package manager is available on your host, it'll be used. Either way, the point still kinda holds :smile:

DaanDeMeyer commented 3 months ago

There have indeed been quite a few improvements to mkosi lately so I understand that it might not have been suitable when you were working on this.

Note that when it comes to building cross building, we have CI in mkosi that verifies that verifies that mkosi can do cross distribution image builds for all supported distributions. The only combos that aren't directly supported are building Arch/Ubuntu/Debian images from OpenSUSE as they don't package apt or pacman. However, mkosi also supports so called tools trees, where it first builds an image with all necessary tools and then uses that image to build the actual image. So on OpenSUSE, you can build a Fedora tools tree and use that to build all other supported distributions. This is what we use in the systemd repository to build images on the Github Actions CI runners. We simply configure ToolsTree=default and ToolsTreeDistribution=fedora and mkosi will first build a Fedora image with all the latest and greatest tooling and then use that to build the actual image. Of course you still need the package manager on the host system to build the tools tree, but mkosi's CI makes sure that we're notified whenever something breaks in that area.

Re-using the local build directory might indeed be somewhat more difficult, since you would need to trick CMake into not wanting to reconfigure and rebuild when installing. But there's also no guarantee that the local build directory would actually work unless the dependencies are roughly the same as on the host.

Anyway, feel free to email me if you ever feel the itch to switch to mkosi.