Support UKI - Githubissues

cgwalters commented 1 year ago

See https://github.com/uapi-group/specifications/blob/main/specs/unified_kernel_image.md and https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_1

There are two major points here:

UEFI only

We'll need to add a UEFI backend to ostree, which explicitly controls the UEFI boot ordering via e.g. efibootmgr instead of using the /boot/loader/entries stuff.

Kernel cmdline :arrow_right: rootfs

One goal of the UKI work is to have generic Linux distributions sign both the kernel and initramfs and stock kernel cmdline. However, ostree today embeds the target rootfs in the kernel cmdline - this creates a recursion issue.

Option: ostree=N and symlinks and using systemd-stub credentials

We can change ostree-prepare-root in the initramfs to automatically find the latest symlink in /sysroot/ostree - we effectively do almost this with /ostree/boot.[01] today.

(Something to debate here is whether we require an ostree= karg at all; our initramfs code is conservative today in making ostree opt-in, but for people who are requiring it, we could also just add a flag to default it to on, finding the latest deployment)

The interesting thing here is what it looks like to fetch a userspace only update.

That flow would look like this:

Initial system deployment has one UKI in ESP
ostree admin upgrade or bootc update or whatever, fetch new rootfs but not a new kernel UKI
ostree defaults to enabling rollback today, so for systemd-stub we'd copy the existing UKI, and add a credential that tells the initramfs to look for the previous deployment

Option: Parsing the UKI filename

See https://github.com/ostreedev/ostree/issues/2753#issuecomment-1488587533

ricardosalveti commented 1 year ago

I think what we'd do instead is have the initramfs automatically find the latest symlink in /sysroot/ostree - we effectively do almost this with /ostree/boot.[01] today.

How would we know from the initramfs when a rollback was performed? Thinking on the use scenario in which the bootloader decides to rollback as the new deployment/update is not good enough (wasn't confirmed), as currently this can be done quite easily by booting the previous deployment (previous initramfs), which also has the previous ostree argument in place.

dbnicholson commented 1 year ago

UEFI only

We'll need to add a UEFI backend to ostree, which explicitly controls the UEFI boot ordering via e.g. efibootmgr instead of using the /boot/loader/entries stuff.

What's wrong with using boot loader entries? Wouldn't we expect that the UEFI boot loader participating in the scheme (e.g. sd-boot) to support the boot loader spec? Mucking with the UEFI boot entries doesn't sound that pleasant to me. I'm pretty sure sd-boot deals with it fine as we've been using it with a UKI on a product for a few years. The only changes we have to make are horrifying ones to deal with the lack of symlinks on VFAT (#1719). I guess using UEFI boot variables would side step that issue, though.

dbnicholson commented 1 year ago

Also, if you're building a UKI, the initramfs is part of it and there's no need for ostree to find it. Are you suggesting the ostree take the separate kernel and initramfs and generate a UKI?

ericcurtin commented 1 year ago

Is the "how to find the rootfs" problem, the chicken in the egg problem in that you need to populate the "ostree=" karg in the UKI, but you only know what that karg should be after you commit? So that you can only boot the n-1 commit in that case?

I had thoughts on this, you could write an extra value client side, maybe as a new entry "ostree" in the bls. So you could do:

title Fedora Linux 36.20221024.0 (Silverblue) (ostree:0)
version 2
options rhgb quiet root=UUID=7d8417b0-eb2d-4c6d-b0b1-ac72c11104d4 rootflags=subvol=root 
linux /ostree/fedora-3d1ddf0131c05a2bc1ea548f3ad426c25b03dfe672b7b5c0d725ad4417b062dc/vmlinuz-5.19.16-200.fc36.x86_64
initrd /ostree/fedora-3d1ddf0131c05a2bc1ea548f3ad426c25b03dfe672b7b5c0d725ad4417b062dc/initramfs-5.19.16-200.fc36.x86_64.img
ostree /ostree/boot.1/fedora/3d1ddf0131c05a2bc1ea548f3ad426c25b03dfe672b7b5c0d725ad4417b062dc/0

If (ostree_entry_exists) {
  read rootfs from "ostree" bls entry
}
else {
  do the previously existing read a karg way
}

But I'm happy with whatever works :)

Encountering the same issue in the UKI-like aboot bootloader.

cgwalters commented 1 year ago

I think the main reason to embed the rootfs in the kernel cmdline is basically integration with bootloader menus - e.g. to be able to choose the previous deployment in the GRUB GUI.

However, this is not a requirement. We could instead read a value from the target root, one could imagine something as simple as a symlink /ostree/deploy/fedora-coreos/deploy/current pointing to a deployment root. (And we could also have /ostree/current be a symlink pointing to the default stateroot so we basically have a default given a root filesystem)

Perhaps a strawman here is that specifying a bare ostree value on the kernel command line would mean "use the default". We could extend this to e.g. ostree_root=fedora-coreos to support specifying a stateroot (but I'm not sure how much we really care about the multiple stateroot stuff going forward)

And then to boot the previous deployment, we support an ostree=previous or more generally ostree=n=[0..] that takes an integer value.

cgwalters commented 1 year ago

Is the "how to find the rootfs" problem, the chicken in the egg problem in that you need to populate the "ostree=" karg in the UKI, but you only know what that karg should be after you commit?

By default, we don't do client side commits. Hence, the digest is actually fully predictable and known in advance on the build server. But certainly there is a circular dependency here for any systems which are doing fully sealed kernel command lines - we'd need to generate the rootfs and compute its digest, then patch the kernel binary (which would in theory invalidate that digest, but OTOH nothing actually reads the kernel from the rootfs; the fallout would just be things like ostree fsck would fail on that file, but we could teach the client to ignore that).

Arguably perhaps, we should have better support on the client for something like "ghosting" the kernel/initramfs from /usr/lib/modules - i.e. we ship them in the ostree commit, but deploy time we actually prune the data from the rootfs to make clear it has migrated into the bootloader state (whether that's a separate /boot partition or UEFI).

cgwalters commented 1 year ago

What's wrong with using boot loader entries? Wouldn't we expect that the UEFI boot loader participating in the scheme (e.g. sd-boot) to support the boot loader spec? Mucking with the UEFI boot entries doesn't sound that pleasant to me.

I think many users/organizations that want to deploy UKIs will want to do so without involving any bootloader at all. But yes, we should probably also support deployment with a bootloader.

dbnicholson commented 1 year ago

What's wrong with using boot loader entries? Wouldn't we expect that the UEFI boot loader participating in the scheme (e.g. sd-boot) to support the boot loader spec? Mucking with the UEFI boot entries doesn't sound that pleasant to me.

I think many users/organizations that want to deploy UKIs will want to do so without involving any bootloader at all. But yes, we should probably also support deployment with a bootloader.

That does make sense. But what's actually unpacking the UKI in that case? Some other UEFI program? I don't believe the linux kernel itself supports booting directly from a combined kernel+initramfs PE program. Ah, sd-stub. I missed that. I guess if you're all in on UEFI and want the minimal boot environment, then even sd-boot is superfluous.

ljrk0 commented 1 year ago

What's wrong with using boot loader entries? Wouldn't we expect that the UEFI boot loader participating in the scheme (e.g. sd-boot) to support the boot loader spec? Mucking with the UEFI boot entries doesn't sound that pleasant to me.

I think many users/organizations that want to deploy UKIs will want to do so without involving any bootloader at all. But yes, we should probably also support deployment with a bootloader.

That does make sense. But what's actually unpacking the UKI in that case? Some other UEFI program? I don't believe the linux kernel itself supports booting directly from a combined kernel+initramfs PE program. Ah, sd-stub. I missed that. I guess if you're all in on UEFI and want the minimal boot environment, then even sd-boot is superfluous.

It depends on the use-case and the hardware mostly. Adding new bootloader entries to the EFI menu works only so-so on some hardware/firmware and often switching b/w different entries on boot is cumbersome (not to mention vendor-specific). Thus, having different boot environments with different kernels/deployments would be much easier with sd-boot than with "native" UEFI boot loading. This is the reason why I use sd-boot on all my systems in combination with UKIs.

ericcurtin commented 1 year ago

One thing that's not clear to me is how do we deliver a UKI (is it it's own rpm?), because it would be built on the osbuild-side rather than the end device...

dbnicholson commented 1 year ago

Sure, the same way you build the kernel and initramfs on an ostree system. They're just bundled together for a UKI. I think the only thing in there that doesn't fit that model is the kernel command line since ostree currently allows you to manage that locally and it often contains root, which is inherently local. Certainly you can come up with a default command line, but at a minimum you'd have to rely on something like systemd's discoverable partitions setup to not have root in there.

dbnicholson commented 1 year ago

I guess the way we do it right now at Endless is that the initramfs is generated in our ostree builder. For our systems that use a unified kernel with sd-boot, that's also generated in our ostree builder.

There's no reason they couldn't be packaged except that generating the initramfs requires installing all the dracut modules that you want in there. We decided that wasn't worth the effort and it was easier to do that in the ostree builder since it would by definition have all the modules installed.

alexlarsson commented 1 year ago

Just to be explicity, i didn't propose just shipping the commit in the detached metadata, as that is not trusted. What i proposed it to ship some public key in the initrd, sign the commit id with the private part, store it in detached metadata, and then throw away the private key.

Then the initrd can validate the commit it reads from somewhere.

travier commented 1 year ago

Posting here the result of several discussions that we've had recently:

The major change is the need to move the ostree deployment hash out of the kernel command line as the kernel command won't be modifiable in the UKI case.

The suggested design is that ostree would take the UKI from the ostree commit, move it to the EFI partition and rename it with the following convention:

<boot entry order>.<name of the UKI>.<ostree deployment hash>

For example: 0.fedora-6.1.11-200.fc37.x86_64.ostree=ac1613dda93a56bfbef…

We would then need to add support in the initramfs to read the ostree deployment hash from the name of the UKI that has been booted instead of reading it from the kernel command line. This could be done either by reading the name from EFI variables or from the TPM event log.

# efibootmgr -v -u
BootCurrent: 0001
Timeout: 0 seconds
BootOrder: 0001
...
Boot0001* redhat HD(2,GPT,0a368044-fab0-914a-9500-218489723cfd,0x2800,0x7e000)/File(\EFI\redhat\shimx64.efi)\EFI\Linux\vmlinuz-5.14.0-282.kpq0.el9.x86_64-virt.efi

We'll need to add a UEFI backend to ostree, which explicitly controls the UEFI boot ordering via e.g. efibootmgr instead of using the /boot/loader/entries stuff.

While this would be nice, I don't think it's strictly needed if we still have a bootloader (systemd-boot preferably) that is capable of booting BLS config entries.

travier commented 1 year ago

The design above could be combined with the suggestion from https://github.com/ostreedev/ostree/issues/2753#issuecomment-1488572499 and the use of composefs to verify the content of the deployment.

ericcurtin commented 1 year ago

Sorry I deleted that as I wanted to rewrite a portion, it belongs before @alexlarsson 's comment

So some automotive folk were discussing Android boot images, which are similar to UKIs in that it is a "kernel, initrd, cmdline and signature" that gets generated server-side and delivered to the client via ostree. This leave an issue of how do you deliver and boot the ostree SHA.

It is difficult to boot via karg because that has the recursive problem, how do you deliver that SHA without altering the SHA?

@alexlarsson suggested ostree detached metadata, that way you can deliver the SHA without altering the SHA, so we think this should solve the problem.

But this requires booting via an alternate means to booting via karg, and I will explore the symlink techniques @cgwalters suggested above.

Wondering what you guys think of this as a proposal? A similar technique could be used for UKIs and Android Boot Images.

ericcurtin commented 1 year ago

@travier thanks for sharing the output of the discussions here, in the case where you don't have an EFI partition available which is the case in Android Boot Images, do you think it's reasonable to move forward with @cgwalters symlink approach suggested above?

https://github.com/ostreedev/ostree/issues/2753#issuecomment-1315476200

bauen1 commented 1 year ago

We would then need to add support in the initramfs to read the ostree deployment hash from the name of the UKI that has been booted instead of reading it from the kernel command line. This could be done either by reading the name from EFI variables or from the TPM event log.

The UEFI variable LoaderImageIdentifier is set by systemd-stub, that might be a much simpler way of reliably obtaining the filename that was booted, since you probably want to support systems without a TPM.

Reading this thread, I can't help but feel like this is getting over engineered (or rather "complex") ...

I'm personally not interested in building the UKI on the server and loosing the ability to specify command line arguments, however I think that's a requirement if you want the UKI to be signed e.g. by the distribution itself ?

Since that isn't my goal, I'm currently building the UKI on the host, supporting kernel arguments, If I need the image on the build server, e.g. for signing or attestation, I can simply take the kernel arguments and build the (fully reproducible) UKI.

alexlarsson commented 1 year ago

For example: 0.fedora-6.1.11-200.fc37.x86_64.ostree=ac1613dda93a56bfbef…

We would then need to add support in the initramfs to read the ostree deployment hash from the name of the UKI that has been booted instead of reading it from the kernel command line. This could be done either by reading the name from EFI variables or from the TPM event log.

The problem with this is that it moves the indentifier for the rootfs from a trusted location (in signed uki) to a completely untrusted location (the filename). Anyone can just rename the FAT file and make it boot some other rootfs.

This is fine if you don't care about validationg, but it is nowhere enough for a secureboot trusted boot into the rootfs.

cgwalters commented 1 year ago

The problem with this is that it moves the indentifier for the rootfs from a trusted location (in signed uki) to a completely untrusted location (the filename). Anyone can just rename the FAT file and make it boot some other rootfs.

Not any other rootfs; you'd include the key used to sign the composefs in the initramfs, and validate it from there.

So the problem then turns to rollback protection, and that's a nuanced topic because it's absolutely valid to want to roll back sometimes.

ericcurtin commented 1 year ago

I liked the symlink approach over EFI partition. The problem with using EFI features, is you start to depend on fully implemented UEFI, which would be nice, but it's not always the case, especially on non-x86 systems. If we could self-contain the solution as much as we can in the main rootfs partition it would be better (over using EFI partitions).

cgwalters commented 1 year ago

I liked the symlink approach

To be clear "symlink approach" = https://github.com/ostreedev/ostree/issues/2753#issuecomment-1315476200 ?

I edited that earlier comment to elaborate a bit about how rollbacks would work; so the previous bootloader entries would gain ostree=1 to mean the previous deployment (as opposed to ostree=0 being the default).

ericcurtin commented 1 year ago

I liked the symlink approach

To be clear "symlink approach" = #2753 (comment) ?

Yes, it removes the hard dependency on EFI.

I edited that earlier comment to elaborate a bit about how rollbacks would work; so the previous bootloader entries would gain ostree=1 to mean the previous deployment (as opposed to ostree=0 being the default).

travier commented 1 year ago

We need a way to choose which deployment to boot as we need to support rollbacks (rollback protection is another topic that we are not covering here and would be implemented separately). As we can not change the command line, we need a way to pass that info to the initramfs. Using the filename of the UKI is one way of doing that.

Note that this deployment hash isn't particularly trusted data: it only makes sense if the deployment exists in the rootfs. Whether or not it's a valid deployment is thus a question of whether or not we have integrity for the rootfs and that's a composefs / LUKS discussion.

You can not use that to boot an arbitrary deployment that would not be in the rootfs already.

cgwalters commented 1 year ago

@travier what problems do you see with https://github.com/ostreedev/ostree/issues/2753#issuecomment-1315476200 ?

travier commented 1 year ago

Perhaps a strawman here is that specifying a bare ostree value on the kernel command line would mean "use the default". We could extend this to e.g. ostree_root=fedora-coreos to support specifying a stateroot (but I'm not sure how much we really care about the multiple stateroot stuff going forward)

And then to boot the previous deployment, we support an ostree=previous or more generally ostree=n=[0..] that takes an integer value.

As far as I understand this involves modifying the kernel command line which is not compatible with UKIs.

travier commented 1 year ago

If we do a mapping boot-entry-number -> deployment-hash in the rootfs then that could work but that would be an additional indirection layer:

Read boot entry number from UKI file name
Open rootfs, find ostree deployment hash from symlink: /ostree/deploy/fedora-coreos/deploy/0 -> 1234567890...
Use the deployment hash

travier commented 1 year ago

Not sure how robust this would be in case of power failures as we would need to update two places at the same time every time we do a new deployment: UKI file name + deployment hash symlink.

cgwalters commented 1 year ago

But we don't change the UKI for every deployment. We don't want to have to touch the kernel config when only userspace changes in general, right?

travier commented 1 year ago

We would have one UKI per boot entry in all cases, even if they are the same files as vfat has no hardlinks.

travier commented 1 year ago

We can share kernel & initramfs right now because we use BLS configs to set the roothash, etc. We can't use that with UKIs.

travier commented 1 year ago

@ericcurtin All the work on UKIs assumes you're using UEFI, Secure boot and a TPM.

cgwalters commented 1 year ago

Yes, but they have a case where they want to boot via adb, which is similar in architecture but not exactly UKI.

Also, as far as I know nothing stops one from doing something UKI-like for e.g. zipl.

From the perspective of the bootloader, they can't tell the difference between a kernel with an initramfs embedded and one without!

So the argument here is really to just change the ostree default to reading a symlink and not to inject a hash into the kernel cmdline.

ericcurtin commented 1 year ago

s/adb/Android Boot Image/g , it's basically what UKI's used for influence, it's commonly used on Android devices, Chromebooks, Automotive hardware, etc.

From the perspective of the bootloader, they can't tell the difference between a kernel with an initramfs embedded and one without!

Yes this is true

travier commented 1 year ago

I don't understand how this would work. How do you know which boot entry you booted with a UKI if you can't pass any info via the bootloader or the filename?

travier commented 1 year ago

The only case I can think of which uses UKIs without UEFI and without Secure Boot is to rely on the TPM to measure the UKI to unlock a LUKS encrypted rootfs. In this case you also need something else to pass the information about which boot entry has been booted. I don't know if GRUB or other bootloaders have that kind of interface. uboot has a EFI mode that can emulate EFI behavior.

travier commented 1 year ago

Note that the whole design with UKI relies on getting rid of boot loader config entries and relying on the Type 2 BLS specification: https://uapi-group.org/specifications/specs/boot_loader_specification/#type-2-efi-unified-kernel-images

vittyvk commented 1 year ago

FWIW, there's a proposal (and a proof-of-concept in works) to add an allowlist of options which are allowed to change to systemd-stub: https://github.com/systemd/systemd/issues/24539#issuecomment-1488571035 If implemented, ostree can probably be specified there.

It is still an open question how to pass these additional options to UKIs, especially in the absence of a 'real' bootloader when e.g. UKI is booted directly from shim. This can probably be done through a UEFI variable or a file on ESP (e.g. systemd-stub can read BLS config), or something else.

cgwalters commented 1 year ago

I don't understand how this would work. How do you know which boot entry you booted with a UKI if you can't pass any info via the bootloader or the filename?

Inherently, the job of the initramfs is to mount the root filesystem. A common default is to mount the root filesystem via e.g. root=UUID= or so. ostree just extends this one step with a notion of "choose userspace snapshot" via ostree=, which is very similar to e.g. a btrfs root=subvol type thing.

So the initramfs does:

mount root filesystem
If we have ostree=0 in the kernel command line, use the deployment root to which the symlink /ostree/0 is pointing.
Similar for ostree=1 etc.
If composefs is in use, verify its signature using the key embedded in the initramfs before mounting
If composefs is not in use, set up the bind mount/chroot dance as ostree does today

Another way to say it is: with a UKI-based apt/yum system, you don't need to change the UKI or bootloader when you apt install cowsay locally.

Again, the bridge between UKI and verified userspace is a key embedded in that UKI.

travier commented 1 year ago

For https://github.com/systemd/systemd/issues/24539 to work we need BLS Type 1 entries (config files) and bootloader support to extend the UKI kernel command line with the options passed into that config file.

travier commented 1 year ago

If we have ostree=0 in the kernel command line, use the deployment root to which the symlink /ostree/0 is pointing.

How do you set that in the kernel command line and how do you update that when you change the order of deployments?

travier commented 1 year ago

We could generate a random hash and include it both in the UKI kernel command line and setup the symlinks in the rootfs but that would be another indirection like I mentioned in https://github.com/ostreedev/ostree/issues/2753#issuecomment-1488796156.

cgwalters commented 1 year ago

You're right, I wasn't covering a detail here. At this point though the thread is unwieldy, so I've amended the initial comment here. I think systemd-stub credentials are already a way to pass this data and it's what it's designed for.

That said, I also do think we can't design solely for systemd-stub. A very interesting case that's entwined with all of this is whether systems using ostree want to explicitly support locally-initiated rollback.

If you don't (and I think that's valid!) then there's no need for a "fallback" UKI that would appear as a separate bootable entry at all. Instead, it'd be up to userspace (whether initramfs or real root) to verify health and locally initiate a change in the default UKI/rootfs pair.

travier commented 1 year ago

Using credentials is indeed also an option.

Note that this requires systemd-stub so is UEFI only, so this is the same case as the filename option but easier to manage I agree ~~and lets us share the UKIs~~.

travier commented 1 year ago

For a kernel binary called foo.efi, it will look for files with the .cred suffix in a directory named foo.efi.extra.d/

Looks like this won't let us share UKI if I understand correctly.

bauen1 commented 1 year ago

I'll post my current test setup here, simply because it might be useful to someone, obviously it won't be usable for the use case discussed here (Firmware SecureBoot, i.e. with Microsofts keys).

It assumes, that the UEFI implementation is properly done, i.e. allows easily choosing between multiple Boot entries, creating/deleting a lot of Boot variables and modifying the BootOrder a lot isn't a problem, ...
The UKI image is around 64MiB+, and two copies are placed on the ESP for every BLS entry, if you wanted another UKI without the quiet option, or one for recovery, that can become very costly.
It supports local kernel cmdline arguments, which is incompatible with the use cases discussed here
The objcopy call appears to be the slowest part, but perhaps this could be optimized, by linking everything except the cmdline during build already.
The UKI build is reproducible
The actual deployment should be atomic by doing the following:
1. Remove Boot variables pointing to a leftover /boot/efi/EFI/bauen1-uki.$new_bootnum/UKIs
2. Remove any leftover /boot/efi/EFI/bauen1-uki.$new_bootnum/UKIs
3. Copy the UKIs to /boot/efi/EFI/bauen1-uki.$new_bootnum/UKIs
4. Synchronize the ESP filesystem
5. Create new Boot variables pointing to /boot/efi/EFI/bauen1-uki.$new_bootnum/*
6. Remove the now unnecessary Boot entries pointing to /boot/efi/EFI/$old_bootnum/*
7. Remove /boot/efi/EFI/$old_bootnum/* (And I've just realized, that I forgot to implement this part ...)
8. Synchronize the ESP filesystem This way there should always be a set of UKIs with associated Boot entries to boot from, but I'm not really sure how atomic the update of the EFI variables is, especially the automatic modification of BootOrder by efibootmgr.

Instead of doing the Boot-entry dance using systemd-boot would probably be easier, however I don't really like how much "magic" systemd-boot does, it seems easy to accidentally to build an actually insecure system.

Here is the grub-mkconfig script:

#!/bin/sh

set -eu

if [ "$1" != "-o" ]; then
    echo "Usage: $0 -o <cfg>"
    exit 1
fi

if [ -z "$2" ]; then
    echo "Usage: $0 -o <cfg>"
    exit 1
fi

# FIXME: assert, that _OSTREE_GRUB2_IS_EFI is not set, if it has been set, then
#        ostree will use different logic, which is probably incompatible.
# FIXME: replace by using _OSTREE_GRUB2_BOOTVERSION, which also checks that we have been called by ostree
# We get called like `grub-mkconfig -o /boot/loader.0/grub.cfg`, use $2 to obtain the /boot/loader.$bootnum directory
if [ "$2" = "/boot/loader.0/grub.cfg" ]; then
    OLD_BOOTNUM="1"
    NEW_BOOTNUM="0"
elif [ "$2" = "/boot/loader.1/grub.cfg" ]; then
    OLD_BOOTNUM="0"
    NEW_BOOTNUM="1"
else
    echo "Usage: $0 -o /boot/loader.[01]/grub.cfg"
    exit 3
fi

LOADER_DIR="$(dirname "$2")"

if [ -d "$LOADER_DIR/uki" ]; then
    # Might be a left over from e.g. a failed previous run.
    echo "Removing (old) $LOADER_DIR/uki"
    rm -r "$LOADER_DIR/uki"
fi
mkdir "$LOADER_DIR/uki"

for entry_file in "$LOADER_DIR"/entries/*.conf; do
    echo "Parsing BLS entry file '$entry_file':"

    # 1. Parse the BLS configfile:
    ENTRY_TITLE="$(grep "^title " "$entry_file" | sed 's/^title //')"
    ENTRY_VERSION="$(grep "^version " "$entry_file" | sed 's/^version //')"
    ENTRY_OPTIONS="$(grep "^options " "$entry_file" | sed 's/^options //')"
    ENTRY_LINUX="$(grep "^linux " "$entry_file" | sed 's/^linux //')"
    ENTRY_INITRD="$(grep "^initrd " "$entry_file" | sed 's/^initrd //')"

    # Technically the 'version' is supposed to be sorted using debian version sort style, but we assume
    # that the filenames generated by ostree are enough for ordering, which will probably break once you have 9+ deployments

    ENTRY_FILENAME="${entry_file##*/}"
    UKI_PATH="$LOADER_DIR/uki/${ENTRY_FILENAME%.conf}.efi"
    echo "Resulting UKI will be stored in '$UKI_PATH'"

    echo "$ENTRY_OPTIONS" > "$UKI_PATH.cmdline"

    # Build the actual UKI, note that it is always rebuild / shouldn't exist yet
    # --preserve-dates: For a reproducible timestamp in the PEI header
    objcopy \
        --preserve-dates \
        --add-section .cmdline="$UKI_PATH.cmdline" --change-section-vma .cmdline=0x30000 \
        --add-section .linux="/boot/$ENTRY_LINUX" --change-section-vma .linux=0x2000000 \
        --add-section .initrd="/boot/$ENTRY_INITRD" --change-section-vma .initrd=0x3000000 \
        /usr/lib/systemd/boot/efi/linuxx64.efi.stub \
        "$UKI_PATH"
done

# Sync build images to /boot/efi
# See also <https://bugzilla.gnome.org/show_bug.cgi?id=724246>

ESP_DIR="/boot/efi/EFI/bauen1-uki"

mkdir -p "$ESP_DIR.0" "$ESP_DIR.1"
sync --file-system "/boot/efi/EFI"

echo "OLD_BOOTNUM: $OLD_BOOTNUM"
echo "NEW_BOOTNUM: $NEW_BOOTNUM"

# We assume, that the currently used Boot variables point to "$ESP_DIR.$OLD_BOOTNUM", so we can safely
# remove "$ESP_DIR.$NEW_BOOTNUM"

# Figure out some values for modifiny UEFI Boot variables:
ESP_DEVICE="$(df /boot/efi | tail -1 | awk '{ print $1 }')"
ESP_PARTNUM="$(cat /sys/class/block/"$(basename "$ESP_DEVICE")"/partition)"
ESP_PARTUUID="$(blkid "$ESP_DEVICE" -o export | awk -F'=' '/PARTUUID=/ { print $2 }' )"
echo "device=$ESP_DEVICE partnum=$ESP_PARTNUM partuuid=$ESP_PARTUUID"

cleanup_bootvars() {
    # Removes any boot variables referencing a certain $ESP_DIR.$BOOTNUM
    # $1: bootnum

    # Now we know that we are looking for something similar to:
    # HD($ESP_PARTNUM,GPT,$ESP_PARTUUID,somehex,somehex)/File(\EFI\bauen1-uki.$BOOTNUM\.*)

    # efibootmgr outputs like:
    # BootXXXX* title with possible spaces\tActualEntry
    ENTRIES="$(efibootmgr -v | grep -E '^Boot[[:xdigit:]]{4}' | awk -F'\t' '/^[^\t]+\tHD\('"$ESP_PARTNUM,GPT,$ESP_PARTUUID"',.*\)\/File\(\\EFI\\bauen1-uki.'"$1"'\\.*\)$/ { print $0 }')"

    printf "Boot entries that will be removed:\n%s\n" "$ENTRIES"

    for entry in $(echo "$ENTRIES" | grep -E '^Boot[[:xdigit:]]{4}' --only-matching | sed 's/^Boot//'); do
        echo "Removing $entry"
        efibootmgr --delete-bootnum --bootnum "$entry"
    done
}

# 1. Cleanup any left over Boot variables still pointing to $ESP_DIR.$NEW_BOOTNUM
cleanup_bootvars "$NEW_BOOTNUM"

# 2. Cleanup $ESP_DIR.$NEW_BOOTNUM
if [ -e "$ESP_DIR.$NEW_BOOTNUM" ]; then
    echo "Removing $ESP_DIR.$NEW_BOOTNUM"
    rm -r "$ESP_DIR.$NEW_BOOTNUM"
    sync --file-system "/boot/efi/EFI"
else
    echo "Skipping removal of $ESP_DIR.$NEW_BOOTNUM, does not exist"
fi

# 3. Create new $ESP_DIR.$NEW_BOOTNUM
echo "Creating $ESP_DIR.$NEW_BOOTNUM"
mkdir "$ESP_DIR.$NEW_BOOTNUM"
cp -v "$LOADER_DIR/uki"/*.efi "$ESP_DIR.$NEW_BOOTNUM"/
sync --file-system "/boot/efi/EFI"

# 4. Create new Boot variables
for f in "$ESP_DIR.$NEW_BOOTNUM"/*; do
    echo "Creating Boot entry for file '$f':"
    efibootmgr \
        --create \
        --disk="$ESP_DEVICE" \
        --part="$ESP_PARTNUM" \
        --label="${f##*/}" \
        --loader="${f##/boot/efi}"
done

# 5. Set BootOrder (and maybe BootNext ?)
# FIXME: efibootmgr --create adds the entries to the currently defined BootOrder, however I need to verify
#        what order is used, and if that is already what is necessaery
#        It appears to already do everything correctly.

# 6. Remove now unused old Boot variables
cleanup_bootvars "$OLD_BOOTNUM"

# Finally actually touch the output file to make ostree happy
echo "Touching empty (fake) output file '$2'"
touch "$2"

ericcurtin commented 2 months ago

Btw for the Android Boot Image implementation this is what we did (it's high level design is very similar to UKIs).

UKIs aren't designed to have as malleable a cmdline as a BLS file locally client-side, so we set ostree karg to simply:

ostree=true

Then we created symlinks like:

/ostree/root.a /ostree/root.b

which pointed to two different sysroots (the ostree systemd generator parsed the osname/stateroot from this symlink also).

travier commented 14 hours ago

So it looks like we have 3 options:

We chainload a UKI using GRUB:
- https://wiki.archlinux.org/title/GRUB#Chainloading_a_unified_kernel_image
- Not great, needs GRUB, needs rewriting GRUB config bits each time
Use systemd-boot and drop the UKI in <ESP>/EFI/Linux/:
- https://wiki.archlinux.org/title/Unified_kernel_image#systemd-boot
- Obviously needs "systemd-boot support" in ostree first: https://github.com/ostreedev/ostree/issues/1719
Use the efi entry from the BLS spec:
- https://uapi-group.org/specifications/specs/boot_loader_specification/#type-1-boot-loader-specification-entries
- Not supported by Fedora's/GRUB's blsconfig support: https://fedoraproject.org/wiki/Changes/BootLoaderSpecByDefault#Differences_from_BootLoaderSpec

So it looks like the only option in the end is to write the UKI in /boot/ostree/ and generate a bit of GRUB config in /boot/grub2 to chainload the UKI.

ostreedev / ostree

Support UKI #2753

UEFI only

Kernel cmdline :arrow_right: rootfs

Option: ostree=N and symlinks and using systemd-stub credentials

Option: Parsing the UKI filename

UEFI only