rhboot / shim

UEFI shim loader
Other
848 stars 290 forks source link

Proof-of-concept for "A/B" fallback booting #502

Open frozencemetery opened 2 years ago

frozencemetery commented 2 years ago

Add a tool (shimctl) that makes shim updates more robust by keeping two copies of shim around (shimARCH.efi and shimARCH_b.efi). Together with shim-booted.service, it also supports tracking what versions of shim were booted (and whether they were the "B" entry), enabling boot counting.

See-also: https://mivehind.net/2022/08/17/shim-ab-booting-poc/ Signed-off-by: Robbie Harwood rharwood@redhat.com

julian-klode commented 2 years ago

Sweet. I'm gonna look at this next week when I get back working this is super interesting to me

miabbott commented 2 years ago

This looks similar to https://github.com/coreos/bootupd/

cc: @cgwalters

cgwalters commented 2 years ago

Yeah, seems like it'd be pretty straightforward to integrate this into bootupd;

frozencemetery commented 2 years ago

Well, python isn't set in stone - this is a proof of concept after all. I think we'd rather rewrite it in C than add a dependency on bootupd to shim.

cgwalters commented 2 years ago

It'd be good to do a sync on this, because bootupd is the thing that updates shim and grub for us today and moving ownership of it to a separate service definitely requires some thought.

On a tangental topic, today bootupd is only really tested on rpm-ostree-based systems, but was explicitly designed to be decoupled from that. A tricky thing here is trying to move ownership of updates to the EFI partition from yum/rpm to a different service. rpm-ostree enforces today moving files from /boot to /usr/lib/ostree-boot, and then bootupd moves them again as part of this: https://github.com/coreos/fedora-coreos-config/blob/254cf4f4e083b36f94a7008a620b59a069ce12c2/manifests/bootupd.yaml#L13

Are you aiming for this code to support traditional yum/dnf systems and (rpm-)ostree based ones? If it's just the former, then I guess for now we could reimplement this logic in bootupd, and canonically use bootupd on rpm-ostree based systems.

I think we'd rather rewrite it in C than add a dependency on bootupd to shim.

I'm not sure that such a thing would need to be a hard dependency, but rather if bootupd isn't installed then shim wouldn't have A/B updates and would just be updated via rpm directly as it is today or so.

But is there a specific reason you think a bootupd dependency would be problematic?

frozencemetery commented 2 years ago

Colin Walters @.***> writes:

It'd be good to do a sync on this,

Sure. The next couple weeks are tight, but if you'd like a call and can find a time that works for both of us, feel free to put something on my calendar. (If you can't find a time, let me know over email/IRC and I'll see if I can move some stuff.)

because bootupd is the thing that updates shim and grub for us today and moving ownership of it to a separate service definitely requires some thought.

"us" being coreos/rpm-ostree?

On a tangental topic, today bootupd is only really tested on rpm-ostree-based systems, but was explicitly designed to be decoupled from that. A tricky thing here is trying to move ownership of updates to the EFI partition from yum/rpm to a different service. rpm-ostree enforces today moving files from /boot to /usr/lib/ostree-boot, and then bootupd moves them again as part of this: https://github.com/coreos/fedora-coreos-config/blob/254cf4f4e083b36f94a7008a620b59a069ce12c2/manifests/bootupd.yaml#L13

Are you aiming for this code to support traditional yum/dnf systems and (rpm-)ostree based ones?

Yes, ideally. Right now we are siloed: ostree's boot stack has been assembled without involvement from our current bootloader engineering. In an ideal world, I'd like the two reintegrated them: robust updates with fallback are hard to say no to, after all. Having more things that behave the same way also makes my life as a distro maintainer easier, so there's a selfish motivation to that as well :)

If it's just the former, then I guess for now we could reimplement this logic in bootupd, and canonically use bootupd on rpm-ostree based systems.

I don't see why you'd reimplement it, given in that case it'll already be shipped with shim. Surely you'd just call shim's tool (in this proposal called shimctl) from bootupd?

I think we'd rather rewrite it in C than add a dependency on bootupd to shim.

I'm not sure that such a thing would need to be a hard dependency, but rather if bootupd isn't installed then shim wouldn't have A/B updates and would just be updated via rpm directly as it is today or so.

Well, the thing is I want all shim to have A/B fallback support :)

There's very little reason I can think of not to have it, and the added reliability of having fallback is attractive.

But is there a specific reason you think a bootupd dependency would be problematic?

Fedora is allergic to both packages growing dependencies and install size increasing. Outside ostree, there's no need for the rest of what bootupd is doing that I can see - updates to the ESP are already atomic.

bootupd is also mentioned in its own README.md as being disabled by default, even on coreos. That means we are quite far away from a world where we could add a dependency on it to shim (and thereby enable it for everyone on UEFI).

Be well, --Robbie

cgwalters commented 2 years ago

updates to the ESP are already atomic.

When done via rpm directly writing things? Really? How can that be, given that there are multiple files involved in both shim and grub? Is anything actually testing it?

bootupd is also mentioned in its own README.md as being disabled by default, even on coreos. That means we are quite far away from a world where we could add a dependency on it to shim

Yeah, but the only reason for that is the non-atomicity, or at least my belief of that.

But it may indeed be the case that a model like this of using the EFI variables to do an A/B setup is sufficient, particularly if on an incomplete update if we hit a failure, and when rolling back we're able to retry the update.

frozencemetery commented 2 years ago

updates to the ESP are already atomic.

When done via rpm directly writing things? Really? How can that be,

The filesystem provides atomic renames, so all that's needed is to use them in scriplets instead of writing files directly into place. (Or get rpm/dpkg to do that, but windmills etc..) An installation script, not too different from the sort I'm proposing in this PoC :)

given that there are multiple files involved in both shim and grub?

Atomic, not transactional.

Defining the order of updates covers the concerns about upgrading here, especially when there's fallback available. For grub, update grubx64.efi before updating grub.cfg, and for shim, shim.efi, fallback.efi, boot.efi, boot.csv.

There are certain things that we could then not do on updates - e.g., remove something from boot.csv, though that's well into problems land for other reasons already.

So it's not transactional, but it's never going to be as long as there are multiple files involved - stuff needs to live at the top level of the ESP. But done that way, it is resilient against a power failure during updates.

Is anything actually testing it?

There has not been observed demand for this. The state of filesystems in the kernel may mean that the window of potential problems for the status quo case is vanishingly small, which could be why, but I'm just guessing.