openSUSE / transactional-update

Atomic updates for Linux operating systems
Other
109 stars 31 forks source link

Adding support for performing updates from a container image #128

Open rdoxenham opened 2 months ago

rdoxenham commented 2 months ago

In this PR I'm introducing support for transactional-update to be able to consume a container image (or OCI artefact) as the source for the next boot snapshot. This is very important functionality that allows customers to build, distribute, and validate their operating system images via a standardised container image workflow; they can build OS images via Dockerfiles, store and retrieve them via a standard image registry, and put them through standard SBOM and vulnerability checkers.

This PR is specifically addressing requirements to enable customers to upgrade an existing machine, regardless of how it was deployed, to a new image-based snapshot, however the initial day1 approach could use tooling such as kiwi-ng system stackbuild to build bootable raw and SelfInstall ISO's based on the same container image, which is already a supported feature.

The approach taken in this PR:

I could use some help with ensuring that I'm making the correct calls, as I'm mixing tukit callext with using the {SNAPSHOT_DIR} directly. I suspect there's a more elegant/safer/better way of achieving this, hence why it's labelled as a work in progress for now.

Further, we likely need to run some validations to make sure that the target image is actually bootable. This code forces a dracut and grub2-mkconfig run, but I can see this being an area where it may be trivial to make the system difficult to boot and rescue. I'm not an expert here, so I'm wondering whether there are checks that we could execute, and if they fail, we can abort the snapshot. Documentation on how to build an image will be very important, especially as it relates to partitions (or btrfs subvolumes that are ignored) but we should likely do some additional sanity checks to verify the state of the new snapshot rather than blindly closing it and enabling the user to reboot, where we're not able to provide a decent level of confidence in a successful reboot.

For testing, I used this on-top of SLE Micro 5.5 and used a test container image available at (registry.opensuse.org/home/roxenham/slemicro/containers/edge/sle-micro/5.5:20240726). This is a very simple image that aims to mirror the standard SLE Micro 5.5 packages as defined in the original Kiwi image definition file, i.e. it's not cut down, but has the bare set of packages typically installed, or as defined in the "Default" profile. Of course, it's perfectly possible to modify this image to suit, e.g. adding a package to it is as simple as building and pushing a new image (noting that the suseconnect is only required for commercially registered images, this wouldn't be required for Leap Micro or MicroOS:

% cat Dockerfile
FROM registry.opensuse.org/home/roxenham/slemicro/containers/edge/sle-micro/5.5:20240726
RUN suseconnect -r <regcode> && zypper --gpg-auto-import-keys ref \
     && zypper in -y nvidia-open-driver-G06-signed-kmp-default \
     && suseconnect -d && zypper clean -a

ARG IMAGE_REPO=unknown
ARG IMAGE=unknown
ARG IMAGE_TAG=unknown

RUN sed -i '/IMAGE/d' /usr/lib/os-release && \
    sed -i '/TIMESTAMP/d' /usr/lib/os-release && \
    echo IMAGE_REPO=\"$IMAGE_REPO\"              >> /usr/lib/os-release && \
    echo IMAGE_TAG=\"$IMAGE_TAG\"                >> /usr/lib/os-release && \
    echo IMAGE=\"$IMAGE_REPO:$IMAGE_TAG\"        >> /usr/lib/os-release && \
    echo TIMESTAMP="`date +'%Y%m%d%H%M%S'`"      >> /usr/lib/os-release

% podman build -t harbor.rancher.rdoxenham.com/slemicro/5.5:$(date +'%Y%m%d') \
    --build-arg "IMAGE=harbor.rancher.rdoxenham.com/slemicro/5.5:$(date +'%Y%m%d')" \
    --build-arg "IMAGE_TAG=$(date +'%Y%m%d')" \
    --build-arg "IMAGE_REPO=harbor.rancher.rdoxenham.com/slemicro/5.5" .

% podman push harbor.rancher.rdoxenham.com/slemicro/5.5:20240727
(...)

Then, this image can be used as an input to code provided as part of this PR:

# transactional-update apply-oci --image harbor.rancher.rdoxenham.com/slemicro/5.5:20240727
(...)

[after reboot]

# cat /etc/os-release
NAME="SLE Micro"
VERSION="5.5"
VERSION_ID="5.5"
PRETTY_NAME="SUSE Linux Enterprise Micro 5.5"
ID="sle-micro"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sle-micro:5.5"
GRUB_ENTRY_NAME="SLE Micro"
IMAGE_REPO="harbor.rancher.rdoxenham.com/slemicro/5.5"
IMAGE_TAG="20240727"
IMAGE="harbor.rancher.rdoxenham.com/slemicro/5.5:20240727"
TIMESTAMP=20240727173117

# rpm -qa | grep nvidia-open-driver
nvidia-open-driver-G06-signed-kmp-default-550.90.07_k5.14.21_150500.55.65-150500.3.47.1.x86_64

Some further guidance from the TU community would be appreciated. Thanks a lot!

[1] This approach aligns well with the bootc project (https://containers.github.io/bootc/filesystem.html#etc) to be able to persist /etc configuration across image updates, which would enable configuration such as OS registration, package repositories, static network configuration, and various other components to persist, and not be overwritten by the OS upgrade. However, one question we may want to answer is whether we want to enable /usr/etc to enable specific files in /etc to be overwritten by force by contents found in /usr/etc to give users a release-valve for changing certain persistent configuration over time as part of a 3-way merge (this PR doesn't do this yet).

lz-coder commented 1 month ago

any plans for this to be merged?

laenion commented 1 month ago

any plans for this to be merged?

Yes, I just have to find the time to finally review it! Sorry for the delay!

joostwestra commented 3 weeks ago

registry.opensuse.org/home/roxenham/slemicro/containers/edge/sle-micro/5.5:20240726 does not exist anymore?

rdoxenham commented 3 weeks ago

Hi @joostwestra - yes, that system rebuilds every day and discards already created tags, so you can either select the latest tag, or just build off latest:

agracey commented 1 week ago

Any update on timeline for a review of this?

puneetlws commented 1 week ago

Since slemicro 6 is also released now, Is it possible to add migration functionality to it through above mechanism, from 5.5 to 6?? Right now to migrate to 6 there is 'transactional-update migrate' command which internally calls zypper migrate, which requires connectivity to SUSEConnect.

joostwestra commented 1 week ago

We tested this feature by manually patching it in. We think it is a valuable feature for a wide audience. Anything we can do to help get this feature to be picked up further?

laenion commented 1 week ago

I'll include it in the next major release (hopefully to be released soon), it's just that I'm currently prioritizing finalizing the work on moving the /etc overlays to btrfs subvolumes.

agracey commented 6 days ago

@laenion Thank you!