zbm-dev / zfsbootmenu

ZFS Bootloader for root-on-ZFS systems with support for snapshots and native full disk encryption
https://zfsbootmenu.org
MIT License
832 stars 65 forks source link

cannot get ZBM working on Tianocore UEFI #431

Open jimsalterjrs opened 1 year ago

jimsalterjrs commented 1 year ago

I have ZBM working fine on a few bare metal systems, but am unable to get successful boot on UEFI-based VMs using Tianocore UEFI.

Whether using the Ubuntu install docs for ZBM itself, or using Simon Cahill's automated Ubuntu installer for ZBM https://github.com/Sithuk/ubuntu-server-zfsbootmenu, everything appears fine until the attempted first boot into the system, but the first boot produces a UEFI shell only.

If I [esc] into the Tianocore Boot Manager, I can see ZFSBootMenu (if I installed using the direct UEFI boot from the main project docs), or rEFInd Boot Manager (if I installed using the rEFInd option in the main project docs, or using Simon's automated installer). But selecting ZFSBootMenu or rEFInd as appropriate does nothing, just blinks the screen and returns to the menu.

ahesford commented 1 year ago

How are you setting up the VM? We use a Tianocore firmware image from Arch for testing and haven't noticed any issues. I'd like to see if we can replicate your issue with an alternative VM setup.

jimsalterjrs commented 1 year ago

qemu-img create to make a QCOW2 image (which lives in a dataset on a ZBM-installed Ubuntu workstation), then create the VM as manual install using that image as the root drive and an Ubuntu 22.04 desktop installer as the SATA CD-ROM that I boot from to get the live environment.

From there, I've tried following the directions for both direct UEFI boot and for REFInd boot at https://docs.zfsbootmenu.org/en/latest/guides/ubuntu/uefi.html, and for automated install (which uses REFI boot) at https://github.com/Sithuk/ubuntu-server-zfsbootmenu .

In either case, I end up with an unbootable install, with the symptoms described above.

FWIW, I have never done a fully manual installation using the project's directions, but I've successfully used the automated script several times. Also FWIW, I tried both methods several times, including wiping everything and beginning again from scratch (not just trying to patch existing failed attempts by redoing specific sections).

jimsalterjrs commented 1 year ago

Oh, and I'm using virt-manager on the Ubuntu host workstation to manage the VM.

One final note if you want it: in order to get Simon's script to work (aside from the failed boot issue), I needed to manually specify a "serial number" for the QCOW2 virtual drive; that makes it show up in /dev/disk/by-id, which is where Simon's script looks for drives. (For your own project's instructions, I just had to change /dev/sda to /dev/vda in the initial variable export section.)

ahesford commented 1 year ago

Would you mind swapping your firmware image with our copy? You should be able to just save this anywhere you like, then run

virsh edit <your-vm>

to open an editor with your libvirt domain contents. Replace the value in the <loader> tag under the <os firmware='efi'> block with the path to the copy you downloaded, then launch the VM and see if it works. For example, I have a <loader> tag that looks like

<loader readonly='yes' type='pflash'>/usr/local/share/ovmf/x64/OVMF_CODE.fd</loader>

Note that libvirt seems to kill any comments when it consumes this XML file, so if you want to save the current value of the <loader> tag, you can do something like

virsh dumpxml your-vm > your-vm.xml

to save a copy.

If you'd prefer, you can instead grab the Arch firmware from their package, unpacking the package and taking the usr/share/edk2/x64/OVMF_CODE.fd. (There are secure boot variants and some other alternatives, but I assume a standard firmware image is sufficient.)

jimsalterjrs commented 1 year ago
root@box:/images/zbm1# wget https://raw.githubusercontent.com/zbm-dev/zfsbootmenu/master/testing/stubs/OVMF_CODE.fd

Then did virsh edit zbm1 and set the loader as follows:

  <os>
    <type arch='x86_64' machine='pc-q35-6.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/images/zbm1/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/zbm1_VARS.fd</nvram>
    <boot dev='hd'/>
  </os>

Attempting to boot the modified VM just results in a single core capped at 100% UTIL and a blank black screen. No disk I/O, no change in 100% CPU on one vcore, just sits there forever.

jimsalterjrs commented 1 year ago

I get similar results if I use my distribution's own OVMF_CODE.fd, which is no longer the default firmware for VMs created under 22.04. Under 22.04, the default UEFI firmware for VMs is /usr/share/OVMF/OVMF_CODE_4M.fd.

So far I've tried OVMF_CODE_4M.fd, OVMF_CODE_4M.ms.fd, OVMF_CODE.fd, and your directly supplied OVMF_CODE.fd. The 4M versions produce the UEFI shell when they fail to detect anything bootable; the non-4M versions produce a blank black screen and 100% CPU util on one core.

zdykstra commented 1 year ago

Just so we're all on the same page, can you provide links to the OVMF firmware (Ubuntu packages, or direct files) that you're using? It'd be nice to test things on our end with the exact same files that you're using.

jimsalterjrs commented 1 year ago
root@elden:/usr/share/OVMF# apt policy ovmf
ovmf:
  Installed: 2022.02-3ubuntu0.22.04.1
  Candidate: 2022.02-3ubuntu0.22.04.1
  Version table:
 *** 2022.02-3ubuntu0.22.04.1 500
        500 http://us.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        500 http://us.archive.ubuntu.com/ubuntu jammy-updates/main i386 Packages
        100 /var/lib/dpkg/status
     2022.02-3 500
        500 http://us.archive.ubuntu.com/ubuntu jammy/main amd64 Packages
        500 http://us.archive.ubuntu.com/ubuntu jammy/main i386 Packages

My OVMF package comes from here: https://packages.ubuntu.com/jammy/ovmf

It's built from this source package here: http://archive.ubuntu.com/ubuntu/pool/main/e/edk2/edk2_2022.02.orig.tar.xz

You can find the binary deb for the OVMF portion only here: http://archive.ubuntu.com/ubuntu/pool/main/e/edk2/ovmf_2022.02-3ubuntu0.22.04.1_all.deb

I have verified with diff that the files in that package, when extracted, are a binary match for the files installed on my actual system.

jimsalterjrs commented 1 year ago

BTW, you mentioned Arch, so if you would like a refresher on how to take Debian packages apart to get at the juicy bits, this one helped me refresh my memory: https://www.cyberciti.biz/faq/how-to-extract-a-deb-file-without-opening-it-on-debian-or-ubuntu-linux/

ahesford commented 1 year ago

Well, the plot thickens. I can successfully boot one of our test VMs with your Ubuntu OVMF_CODE.fd, although using OVMF_CODE_4M.fd causes qemu to report could not load PC BIOS. (Note that I get the same error if I try to use the Arch 4M image, so that particular problem is probably incorrect invocation of qemu on my part.) Our test environment just drives qemu directly and probably misses a lot of the extra flags that libvirt adds to use things like the other firmware image. I'll see about getting a libvirt VM up and running to more closely match your configuration.

jimsalterjrs commented 1 year ago

Thanks @ahesford , I really appreciate it. FWIW, I'm in the process of writing up an article about ZFSBootMenu for Klara Systems, and it would really be a lot easier to do that if I can play with things in my normal virtualized environment, with my normal tools. :)

zdykstra commented 1 year ago

I'm not able to reproduce this issue.

After that, I did a generic install of Ubuntu Server on /dev/vda, just accepting whatever defaults it provided. After rebooting the VM and logging in, I installed the release EFI:

cd /boot/efi/ubuntu
curl -LO https://get.zfsbootmenu.org/zfsbootmenu.EFI
efibootmgr -c -d /dev/vda -p 1-L "ZFSBootMenu" -l \\EFI\\UBUNTU\\ZFSBOOTMENU.EFI

Rebooting the system again had Tianocore loading ZFSBOOTMENU.EFI - after ~10 seconds or so, it warned me that no pools could be found to be imported - as expected, since there were no ZFS pools on the machine.

working.txt

I've attached the XML definition from this VM to this issue so that you can compare it to your local VM.

zdykstra commented 1 year ago

Have you had any luck getting things going in your test environment? You can also do direct kernel/initramfs booting for your VMs. There's the option in virt-manager to point the VM to a kernel and initramfs on the host's filesystem - just grab https://get.zfsbootmenu.org/tar.gz extract and store it somewhere that the qemu process has permission to read.

zdykstra commented 1 year ago

@jimsalterjrs any luck with this on your end?

zdykstra commented 1 year ago

Feel free to re-open this if you have any additional notes/information about this issue.

curiousercreative commented 7 months ago

@zdykstra perhaps I'm running into the same issue. I'm trying to get ZBM running on my laptop, a System76 Galago Pro running their open firmware stack (coreboot, edk2). I've opened an issue on that repo and provided the error logged. If you're interested in reproducing, you may need to build or I can supply you with my built ROM.

swarren commented 4 months ago

FWIW I had the exact same issue, using virt-manager under Ubuntu 22.04 with all current updates. I got annoyed and destroyed the VM, but kept the disk image, then created a new VM re-using that disk image, but during configuration of the VM before "installation", I switched the firmware from OVMF_CODE_4M.ms.fd to OVMF_CODE_4M.fd. I think that's what made it work, unless there was just something else/unrelated bad in the configuration of the original VM. I didn't edit anything else other than selecting which file to use for the HDD image (in my case, an LVM LV).

no-usernames-left commented 3 months ago

FWIW, I'm in the process of writing up an article about ZFSBootMenu for Klara Systems

@jimsalterjrs Did this get released? I looked around but can't seem to find it.

thomasfaingnaert commented 1 month ago

FWIW I had the exact same issue, using virt-manager under Ubuntu 22.04 with all current updates. I got annoyed and destroyed the VM, but kept the disk image, then created a new VM re-using that disk image, but during configuration of the VM before "installation", I switched the firmware from OVMF_CODE_4M.ms.fd to OVMF_CODE_4M.fd. I think that's what made it work, unless there was just something else/unrelated bad in the configuration of the original VM. I didn't edit anything else other than selecting which file to use for the HDD image (in my case, an LVM LV).

FWIW, I also faced this issue on Ubuntu 22.04, and switching the OVMF firmware to usr/share/OVMF/OVMF_CODE_4M.fd in virt-manager fixed it for me.

classabbyamp commented 1 month ago

I believe the .ms. ovmf file is the one with secure boot, so it might make sense that zbm wouldn't work