rhboot / fwupdate

System firmware update support for UEFI machines
99 stars 47 forks source link

Some IBV UEFI implementations will overwrite BootNext if not in BootOrder #55

Closed superm1 closed 7 years ago

superm1 commented 8 years ago

It's been brought to my attention that some UEFI implementations will potentially overwrite any Boot#### variables that are not present in BootOrder during bootup.

Because of this, we've seen some scenarios that entries are overwritten by built-in entries as the UEFI boot manager would reshuffle them on boot-up. Putting the entries into BootOrder will prevent this from happening.

So I'd to propose that fwupdate does the following:

In the Linux application

In the EFI application

The net result should be that whenever a FW update is staged the entry will be created and placed in the BootOrder. This should allow it to also show up in the one time boot menu.

When the update has ran (successfully or not) the entry would be cleaned up and not show in one time boot menu.

A future capsule update would perform the same steps again.

vathpela commented 8 years ago

I'm pretty skeptical about this, tbh. Having to be in BootOrder is a completely non-spec requirement - UEFI 2.4 has no such requirement, and in UEFI 2.5 we explicitly say that the firmware should not be changing any Boot#### or BootOrder variables. (BootNext still gets deleted once booting it has been attempted.)

Additionally, even in 2.4 this behavior is hard to work with and confusing to users - a Boot#### variable from the OS should NEVER be deleted or modified by the system firmware unless the user has done so through some explicit interface to do so. It has been set for a reason, and removing it or modifying it automatically because of some failure makes it harder to figure out what has gone wrong. It also may be a transient failure - you've removed a device while you're working on a system or similar - that will be corrected, and removing a Boot#### / BootOrder entry has transitioned the system to a state that will fail later, without any reason for doing so.

So I think system vendors really need to push back on these IBVs and ship a fix for this issue at the same time (or before, I guess) they introduce ESRT/UpdateCapsule() support.

Are there systems that have shipped which already support ESRT/UpdateCapsule() and have this bug? Is there any way of identifying them? I'd rather not modify BootOrder on machines where we don't have to - it's almost as hard for the user to figure out what's going on when we try to do 3-card-monte with BootOrder as when the firmware does.

superm1 commented 8 years ago

I appreciate the skepticism. I am working with the team to push back on this behavior, but the reality is that v2.4 of the UEFI spec was vague about automatic maintenance of variables other than it can happen.

The boot manager may perform automatic maintenance of the database variables. For example, it may remove unreferenced load option variables or any load option variables that cannot be parsed or loaded, and it may rewrite any ordered list to remove any load options that do not have corresponding load option variables. In addition, the boot manager may automatically update any ordered list to place any of its own load options where it desires. The boot manager can also, at its own discretion, provide for manual maintenance operations as well. Examples include choosing the order of any or all load options, activating or deactivating load options, etc.

What's actually happening on these affected systems is that the boot manager is adding it's own load options (for say PXE, legacy devices [w/ legacy orom on], USB devices, etc). As part of the process of adding these entries if anything isn't in BootOrder it thinks that those fall under automatic maintenance category and can be overwritten.

Version 2.5 (and 2.6) of the spec is (in my mind: emphasis below) very explicit that this type of behavior shouldn't be allowed and that anything valid in both BootNext and BootOrder shouldn't be touched.

The boot manager may perform automatic maintenance of the database variables. For example, it may remove unreferenced load option variables or any load option variables that cannot be parsed, and it may rewrite any ordered list to remove any load options that do not have corresponding load option variables. The boot manager can also, at its own discretion, provide an administrator with the ability to invoke manual maintenance operations as well. Examples include choosing the order of any or all load options, activating or deactivating load options, initiating OS-defined or platformdefined recovery, etc. In addition, if a platform intends to create PlatformRecovery####, before attempting to load and execute any DriverOrder or BootOrder entries, the firmware must create any and all PlatformRecovery#### variables (see Section 3.4.2). The firmware should not, under normal operation, automatically remove any correctly formed Boot#### variable currently referenced by the BootOrder or BootNext variables. Such removal should be limited to scenarios where the firmware is guided by direct user interaction

So as to if there are systems out there with ESRT and UpdateCapsule() - yes there are. The affected systems don't currently have capsules published to LVFS.

As for detecting this? It's so low into the BIOS boot manager code, I don't think you have anything that will detect it programatically unless you were to try to detect if something wasn't executed in BootNext on the next boot. I can probably get a list of affected Dell systems to put into a blacklist, but I've been told this behavior is part of the IBV core. I doubt Dell is the only one that will be affected. We're just the only ones actively trying to make the full stack on Linux work on all our boxes.

So it might be a little heavy handed, but given this is a result of a UEFI 2.4 behavior maybe detect UEFI 2.4 or less and do something?

vathpela commented 7 years ago

I'm going to just make fwupdate 9 do this by default, and try to suss out for fwupdate 10 when we can conditionalize it.