openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.42k stars 1.72k forks source link

grub-probe ("suddenly") fails with "algorithm inherit not supported" #15261

Open zviratko opened 12 months ago

zviratko commented 12 months ago

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version ~amd64
Kernel Version 6.4.15 (but also on 6.3.x)
Architecture amd64
OpenZFS Version 2.2.0-rc3 (but also on stable 2.1.12)

Describe the problem you're observing

This occured after I changed the motherboard in my home server. I had to recompile the kernel/modules to boot the system so I used Ubuntu livecd (23.04) with chroot, recompiled the kernel and ran grub-mkconfig. It failed (I don't remember how exactly), but I expected that and corrected the next boot by hand. The changes made were related to graphics/framebuffer, nothing like architecture setting was touched

After booting into the real system I recompiled the kernel again and ran grub-mkconfig, only to discover it still doesn't boot properly because ${rpool} used by mkconfig was empty

This is because:

# grub-probe --device /dev/nvme0n1p4 /dev/nvme1n1p4 --target=fs_label
grub-probe: error: compression algorithm inherit not supported

My rpool is a mirror of two nvme device partitions:

        rpool                                                ONLINE       0     0     0
          mirror-0                                           ONLINE       0     0     0
            nvme-eui.0000000001000000e4d25c6114d64f01-part4  ONLINE       0     0     0
            nvme-eui.0000000001000000e4d25c324dd64f01-part4  ONLINE       0     0     0

It doesn't fail for my boot pool which looks the same but only has a single (root pool) filesystem mounted at /boot

I have not touched rpool configuration at all, I have not enabled any new features (unless Ubuntu or it's systemd-operating-system decided to do that for me somehow), but I don't think so and zpool history concurs.

Not enabled features:

rpool
      draid
      zilsaxattr
      head_errlog
      blake3
      block_cloning
      vdev_zaps_v2

I tried setting compression explicitely (compression=on, compression=lz4), deleting snapshots in case it's related to an old bug I found (but nothing was changed according to zpool history), recompiling grub, upgrading ZFS and recompiling grub.

The system still boots if I change LINUX_ROOT_DEVICE="ZFS=${rpool}${bootfs%/}" to LINUX_ROOT_DEVICE="ZFS=rpool/${bootfs%/}" It's just grub-probe that fails.

Describe how to reproduce the problem

No idea what changed for this to happen.


Any ideas what this might be or how to provide a useful debug? Should I bug GNU/Grub guys with this? I can try grub-2.12-rcX but I didn't find anything in the changelog and I'd rather find the problem first.

Thanks!

GregorKopka commented 11 months ago

Suggestion: Switch to ZFSBootMenu and forget about GRUB (and its inability to fully deal with ZFS features).

zviratko commented 11 months ago

Good suggestion. I only discovered ZFSBootMenu right after creating this issue, which is a shame as it looks awesome (didn't yet switch to it myself, though).

dmdx86 commented 11 months ago

duplicate of this issue: https://github.com/openzfs/zfs/issues/13873 but ultimately a grub issue: https://savannah.gnu.org/bugs/index.php?64297

tl;dr snapshoppting the top-level dataset of your boot pool will cause this issue. I ran into this a while back and I believe once you do the snapshot, it becomes a permanent irreversible condition and you have to destroy and re-create the pool. My experience was that removing the snapshot did not fix the issue.

zviratko commented 11 months ago

@dmdx86 I don't think it's the same issue but it will be of the same kind as I always snapshotted all filesystems in that pool without an issue, nothing has changed (according to my memory and confirmed by history), all I did was import the pool in Ubuntu LiveCD and reboot. That triggered something but I have no idea what. Feel free to close this until somebody else hits this issue, I'll work on migration to ZFSBootMenu in the meantime ;-)

meilon commented 10 months ago

I have the same issue, also always did do snapshots. The last lines of a debug output grub-probe (v 2.06) are:

grub-core/fs/zfs/zfs.c:3395:zfs: endian = 1
grub-core/fs/zfs/zfs.c:3170:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 0/512
grub-core/kern/fs.c:79:fs: error: compression algorithm inherit not supported

The full output is here: https://pastebin.com/dJWrj482

I updated my kernel and corresponding zfs packages, did a reboot, and voila, GRUB doesn't want to boot anymore.

I can't get ZFSBootMenu to work with my setup (basically an older version of https://openzfs.github.io/openzfs-docs/Getting%20Started/Arch%20Linux/Root%20on%20ZFS.html, it doesn't detect any boot environments). If someone got a hint, that would be great!

dmdx86 commented 10 months ago

If you have a separate pool just for booting (which is what current ZFS docs recommend now days) then IMHO the easiest thing to do is to back up all the data and then destroy / re-create the bpool, and take special precaution to never snapshot the top-level of the bpool. There should be not much in your bpool other than kernels, initrds, and similar files so it shouldn’t be as painful as blowing away your entire dpool.

zviratko commented 10 months ago

@dmdx86 bpool is not the problem, the problem is rpool, and snapshots are one of the reasons we run ZFS there... :) I'm wondering if snapshotting my boot pool would break it as well, even though I have set compatibility=legacy for it...

dmdx86 commented 10 months ago

If you set up the pools and grub correctly you should not have any data on rpool that grub is referencing. Grub does not need to read anything other than bpool data. Once grub loads the kernel, the kernel takes over and imports your other pools.

zviratko commented 10 months ago

I see that there's a misunderstanding of the issue I'm reporting.

I do not have a problem with GRUB not loading the kernel or initramfs from the bpool and booting the kernel. I have a problem with the scripts constructing the root=ZFS=... portion of kernel cmdline, where it calls grub-probe. That's this part of code in /etc/grub.d/10_linux:

case x"$GRUB_FS" in
    xbtrfs)
        rootsubvol="`make_system_path_relative_to_its_root /`"
        rootsubvol="${rootsubvol#/}"
        if [ "x${rootsubvol}" != x ]; then
            GRUB_CMDLINE_LINUX="rootflags=subvol=${rootsubvol} ${GRUB_CMDLINE_LINUX}"
        fi;;
    xzfs)
        rpool=`${grub_probe} --device ${GRUB_DEVICE} --target=fs_label 2>/dev/null || true`
        bootfs="`make_system_path_relative_to_its_root / | sed -e "s,@$,,"`"
        LINUX_ROOT_DEVICE="ZFS=${rpool}${bootfs%/}"
        ;;
esac

On my machine, it executes like so:

# grub-probe --device /dev/nvme0n1p4 /dev/nvme1n1p4 --target=fs_label
grub-probe: error: compression algorithm inherit not supported

Of course the issue could be the same, but I wonder how many people do NOT snapshot their rpools so they never hit this issue? Therefore, I don't think my issue is snapshots, but rather a similiar class of issue with grub-probe where it suddenly fails to detect the pool. I am assuming that a similiar piece of code is present in grub-probe and in grub bootloader itself so why would only grub-probe fail here? (Though it comes to mind that I don't actually update grub bootloader in my EFI boot partition but only the userland tools often, in fact I last did that 2 years ago and I am not going to touch it now for sure :)) Also, I find the way this is done in that script pretty weird and prone to breakage. This could just be a static setting somewhere (/etc/default/grub or similiar), or just taken from current cmdline - which could fail in chroot environments and rescue CDs, but that's where it fails anyway in my experience ¯_(ツ)_/¯, or just left for initramfs (distribution) to figure out, or constructed from the output of "zfs mount" or even "mount" At least in the case of ZFS this makes little sense, and with other filesystems "mount" would suffice, I wonder what grub people were trying to solve there - some embedded systems? Making it work with livecd and installers? I know you don't always have proper rootfs entry in mtab (you might not even have /proc mounted, so no mtab), so I guess grub-probe tries to be more clever and self-contained?

almereyda commented 10 months ago

Seeing the same after upgrading from Ubuntu 23.04 to 23.10. Snapshotting all datasets in this system has never been an issue here since installing this machine with the Ubiquitiy Desktop installer on ZFS with Ubuntu 21.10.

On Ubuntu, the described error message appears when editing the GRUB entry and replacing search --no-floppy --fs-uuid --set=root ... manually with set root=(hd0,gpt3). Else it fails with No such device: ....

zviratko commented 10 months ago

For the record: I am not running Ubuntu but Gentoo.

I wouldn't want to mix different issues there, let's concentrate on why "grub-probe" fails on my pool if someone wants to investigate. GRUB bootloader part is a slightly different issue (maybe same fix will work for both, maybe not), and I would take ZFSBootMenu elsewhere (ZFS mailing list?) to not pollute this issue further...

almereyda commented 10 months ago

Refactored, thanks for reminding me.

SemanticBeeng commented 9 months ago

tl;dr snapshoppting the top-level dataset of your boot pool will cause this issue. I ran into this a while back and I believe once you do the snapshot, it becomes a permanent irreversible condition and you have to destroy and re-create the pool

Happens on Debain 12 also. Ran sanoid and this happened.

Is there any explanation as to exactly what happens and how come a zfs snapshoting operation mutates the state of the pool or partition ?!

update: proxmox has support for addressing the "fragility of booting from ZFS with GRUB" https://pve.proxmox.com/wiki/ZFS:_Switch_Legacy-Boot_to_Proxmox_Boot_Tool / "Repairing a System Stuck in the GRUB Rescue Shell"

tomgray commented 9 months ago

I also experienced this issue when upgrading to OpenZFS 2.2.2.

Recreating the pool with compression disabled fixed it for me (I have snapshots on the pool, OpenZFS 2.2.2):

zviratko commented 8 months ago

PSA: I updated today to grub-2.12 and grub-probe now works. Not sure if it was grub changes or something else that changed in the meantime.

mifritscher commented 7 months ago

I can confim that updating fro 2.12-rc1 to 2.12 helps. ( https://github.com/openzfs/zfs/issues/13873#issuecomment-1889885090 )

n0099 commented 7 months ago

https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/2041739/comments/9

zpool create \
    -o feature@extensible_dataset=disabled \
    -o feature@bookmarks=disabled \
    -o feature@filesystem_limits=disabled \
    -o feature@large_blocks=disabled \
    -o feature@large_dnode=disabled \
    -o feature@sha512=disabled \
    -o feature@skein=disabled \
    -o feature@edonr=disabled \
    -o feature@userobj_accounting=disabled \
    -o feature@encryption=disabled \
    -o feature@project_quota=disabled \
    -o feature@obsolete_counts=disabled \
    -o feature@bookmark_v2=disabled \
    -o feature@redaction_bookmarks=disabled \
    -o feature@redacted_datasets=disabled \
    -o feature@bookmark_written=disabled \
    -o feature@livelist=disabled \
    -o feature@zstd_compress=disabled \
    -o feature@zilsaxattr=disabled \
    -o feature@head_errlog=disabled \
    -o feature@blake3=disabled \
    -o feature@vdev_zaps_v2=disabled \
[...]

Enabling any of the features in the command above will cause grub not to recognize /boot as zfs again when a snapshot is created on bpool.

SimonBard commented 6 months ago

If you have a separate pool just for booting (which is what current ZFS docs recommend now days) then IMHO the easiest thing to do is to back up all the data and then destroy / re-create the bpool, and take special precaution to never snapshot the top-level of the bpool. There should be not much in your bpool other than kernels, initrds, and similar files so it shouldn’t be as painful as blowing away your entire dpool.

How should I recreate the bpool? I mean I know how to create a pool, but then its empty. How do I get the needed data there again?

  1. Should I just install the system from scratch?
  2. Should I use a snapshot/backup of bpool to get it back?
n0099 commented 6 months ago

How do I get the needed data there again?

https://openzfs.github.io/openzfs-docs/Getting%20Started/Ubuntu/Ubuntu%2022.04%20Root%20on%20ZFS.html#step-5-grub-installation

GregorKopka commented 6 months ago

Make a recursive snapshot of the old pool, zfs send that as replication stream somewhere (could be a file somewhere outside that pool), recreate the pool and receive the replication stream into it, install bootloader, reboot.

Gregor

On 3 March 2024 23:40:42 GMT+03:00, Simon @.***> wrote:

If you have a separate pool just for booting (which is what current ZFS docs recommend now days) then IMHO the easiest thing to do is to back up all the data and then destroy / re-create the bpool, and take special precaution to never snapshot the top-level of the bpool. There should be not much in your bpool other than kernels, initrds, and similar files so it shouldn’t be as painful as blowing away your entire dpool.

How should I recreate the bpool? I mean I know how to create a pool, but then its empty. How do I get the needed data there again?

  1. Should I just install the system from scratch?
  2. Should I use a snapshot/backup of bpool to get it back?

-- Reply to this email directly or view it on GitHub: https://github.com/openzfs/zfs/issues/15261#issuecomment-1975319653 You are receiving this because you commented.

Message ID: @.***>

SimonBard commented 6 months ago

Make a recursive snapshot of the old pool, zfs send that as replication stream somewhere (could be a file somewhere outside that pool), recreate the pool and receive the replication stream into it, install bootloader, reboot. Gregor

Many thanks! How do I install the bootloader?

SimonBard commented 6 months ago

How do I get the needed data there again?

https://openzfs.github.io/openzfs-docs/Getting%20Started/Ubuntu/Ubuntu%2022.04%20Root%20on%20ZFS.html#step-5-grub-installation

Many thanks!

Unfortunately, i ran into errors at the first step:

grub-probe /boot
grub-probe: Achtung: Platte existiert nicht, ersatzweise wird Partition des Geräts /dev/sdb4 verwendet
grub-probe: Achtung: Platte existiert nicht, ersatzweise wird Partition des Geräts /dev/sdb4 verwendet
grub-probe: Achtung: Platte existiert nicht, ersatzweise wird Partition des Geräts /dev/sdb4 verwendet
grub-probe: Fehler: Laufwerk >hostdisk//dev/sdb4< wurde nicht gefunden

Translation: grub-probe: Attention: disk does not exist, partition of the device /dev/sdb4 is used instead disk >hostdisk//dev/sdb4< was not found

zpool list
Name  size ALLOC
bool 1.88 G
rpool 1.84 T
n0099 commented 6 months ago

@SimonBard plz show ur

zfs get mountpoint,canmount bpool
stat /boot