rancher / elemental-toolkit

:snowflake: The toolkit to build, ship and maintain cloud-init driven Linux derivatives based on container images
https://rancher.github.io/elemental-toolkit/docs/
Apache License 2.0
282 stars 49 forks source link

Bootloader install fails upon unrecognized efi boot entries #2087

Closed ssusbauer closed 1 month ago

ssusbauer commented 1 month ago

elemental-toolkit version: Unknown: I am seeing this behavior in downstream Harvester, but it appears to be coming from the elemental-toolkit bootloader installation process. I do not know how to verify which version they are using. I can say that the behavior in Harvester 1.1.2 is to display a warning and continue, but by 1.2.0 it is a hard failure instead.

Describe the bug This issue is described more completely in Harvester bug https://github.com/harvester/harvester/issues/4528

The symptom is an error message such as cannot read device path: cannot decode node: 1: invalid length 14 bytes (too large) and a failure to create a boot entry (fatal error in the installer).

I believe it is coming from the elemental-toolkit based on error strings that can then be found in elemental-toolkit bootloader and its efi support, specifically when scanning efi boot entries that the efi library doesn't know how to parse. Rather than ignoring or warning about the entries it can't process, which was a behavior some time in the past, it is being marked as a fatal error and preventing bootloader installation, and causing an install failure.

To Reproduce On an affected system, simply having EFI boot entries that the installer doesn't know how to deal with is enough to trigger the behavior. The behavior is resolved when manually deleting those entries in the efibootmgr prior to install.

Expected behavior Ignore or warn about efi entries the library cannot process, but continue to create a new entry for the just-installed system. I believe the scanning behavior is primarily to allow the installer to delete/reuse a previous entry rather than creating duplicates, the unparsable entries should not hamper this since they should not be pointing to a previous install of the elemental-toolkit.

frelon commented 1 month ago

Hi @ssusbauer!

This is an interesting bug! Generally I would say that we should try reproduce this on a more recent toolkit version, since the go-efilib has had several releases (0.3.1 -> 0.9.5), but I will have a look upstream to see if I can reproduce this using the logs from the linked issue!