openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.35k stars 1.72k forks source link

snapshotting top-level "bpool" filesystem causes grub to fail #13873

Open bghira opened 1 year ago

bghira commented 1 year ago

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version amd64
Kernel Version 5.19.6-gentoo
Architecture amd64
OpenZFS Version zfs-kmod-2.1.99-1358_g60d995727 / zfs-2.1.99-1359_gede037cda

Describe the problem you're observing

I've noticed last week that my system stopped being able to find my zpool which has disabled features so that Grub can detect it.

I ran grub-probe /path/to/bpool and it shows the error:

# grub-probe /roots/gentoo/boot 
grub-probe: error: compression algorithm inherit not supported 
.

Tried setting compression algorithm explicitly on each filesystem in the bpool, no change.

Describe how to reproduce the problem

I recreated the pool:

zpool create -o ashift=13 -o autotrim=on -d -o feature@async_destroy=enabled -o feature@bookmarks=enabled -o feature@embedded_data=enabled -o feature@empty_bpobj=enabled -o feature@enabled_txg=enabled -o feature@extensible_dataset=enabled -o feature@filesystem_limits=enabled -o feature@hole_birth=enabled -o feature@large_blocks=enabled -o feature@lz4_compress=enabled -o feature@spacemap_histogram=enabled -O acltype=posixacl -O canmount=off -O compression=lz4 -O devices=off -O normalization=formD -O relatime=on -O xattr=sa -O mountpoint=/boot bpool mirror /dev/nvme0n1p3 /dev/nvme1n1p3

And then, the command works:

# grub-probe /roots/gentoo/boot 
zfs 

I then snapshot the child filesystem:

# zfs snap bpool/BOOT/gentoo@boot2 
# grub-probe /roots/gentoo/boot 
zfs 
# zfs snap bpool/BOOT@boot2 
# grub-probe /roots/gentoo/boot 
zfs 

I can snapshot the lowest level child filesystem, and I then snapshot the pool itself:

# zfs snap bpool@boot2 
# grub-probe /roots/gentoo/boot 
grub-probe: error: compression algorithm inherit not supported 
.

That's when things go south. Up until this point, I can reboot readily and Grub works just fine. This was working for a very long time, and I haven't upgraded Grub at all.

ryao commented 1 year ago

This looks like a bug in grub. Let us keep this open to track the issue (and encourage patches), but someone hound file a bug with the GRUB project.

mauricev commented 1 year ago

https://savannah.gnu.org/bugs/index.php?64297

R8s6 commented 8 months ago

I still encounter this error running Arch with Grub version 2:2.12rc1-5.

I had to destroy the pool and recreate one, then disable snapshotting on the "boot" pool as a temporary workaround.

Alternatively, one can take snapshots of the datasets but not the pool.

Edit: ~i.e. If you're using sanoid, instead of using recursive = yes, you can use recursive = zfs~

mifritscher commented 6 months ago

I'm using grub grub-probe (GRUB) 2.12~rc1-12 (But I looked at grub master, without notable difference)

I got this problem after updating from zfs 2.1.4 to 2.2.2 (without zfs upgrade).

a grub-install -v -v says:

grub-core/kern/fs.c:56:fs: Detecting zfs...
grub-core/osdep/hostdisk.c:379:hostdisk: opening the device `/dev/nvme0n1p12' in open_device()
grub-core/fs/zfs/zfs.c:1199:zfs: label ok 0
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:1014:zfs: check 2 passed
grub-core/fs/zfs/zfs.c:1025:zfs: check 3 passed
grub-core/fs/zfs/zfs.c:1032:zfs: check 4 passed
grub-core/fs/zfs/zfs.c:1042:zfs: check 6 passed
grub-core/fs/zfs/zfs.c:1050:zfs: check 7 passed
grub-core/fs/zfs/zfs.c:1061:zfs: check 8 passed
grub-core/fs/zfs/zfs.c:1071:zfs: check 9 passed
grub-core/fs/zfs/zfs.c:1093:zfs: check 11 passed
grub-core/fs/zfs/zfs.c:1119:zfs: check 10 passed
grub-core/fs/zfs/zfs.c:1135:zfs: str=com.delphix:embedded_data
grub-core/fs/zfs/zfs.c:1135:zfs: str=com.delphix:hole_birth
grub-core/fs/zfs/zfs.c:1144:zfs: check 12 passed (feature flags)
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 4096/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = -1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, 1a0b050
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:2694:zfs: endian = -1, blkid=0
grub-core/fs/zfs/zfs.c:2031:zfs: endian = -1
grub-core/fs/zfs/zfs.c:2062:zfs: endian = -1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 131072/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = -1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, c0b8f0
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, 16111f0
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:2699:zfs: alive
grub-core/fs/zfs/zfs.c:2505:zfs: looking for 'features_for_read'
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, d00178
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:2515:zfs: zap read
grub-core/fs/zfs/zfs.c:2528:zfs: fat zap
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, b09900
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:2288:zfs: fzap: length 18
grub-core/fs/zfs/zfs.c:2532:zfs: returned 0
grub-core/fs/zfs/zfs.c:2694:zfs: endian = -1, blkid=1
grub-core/fs/zfs/zfs.c:2031:zfs: endian = -1
grub-core/fs/zfs/zfs.c:2062:zfs: endian = -1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 131072/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = -1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, c0b8f0
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, 160f8c0
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:2699:zfs: alive
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 512/512
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, 600cb0
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:2118:zfs: zap: name = com.delphix:extensible_dataset, value = 4, cd = 0
grub-core/fs/zfs/zfs.c:2118:zfs: zap: name = com.delphix:embedded_data, value = 1, cd = 0
grub-core/fs/zfs/zfs.c:2118:zfs: zap: name = com.delphix:hole_birth, value = 1, cd = 0
grub-core/fs/zfs/zfs.c:2118:zfs: zap: name = org.open-zfs:large_blocks, value = 0, cd = 0
grub-core/fs/zfs/zfs.c:2118:zfs: zap: name = org.illumos:lz4_compress, value = 1, cd = 0
grub-core/fs/zfs/zfs.c:2118:zfs: zap: name = , value = 0, cd = 0
grub-core/fs/zfs/zfs.c:2118:zfs: zap: name = , value = 0, cd = 0
grub-core/fs/zfs/zfs.c:3293:zfs: alive
grub-core/fs/zfs/zfs.c:3105:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2694:zfs: endian = 1, blkid=0
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2062:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 131072/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, c0b8f0
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, 16111f0
grub-core/fs/zfs/zfs.c:2699:zfs: alive
grub-core/fs/zfs/zfs.c:3112:zfs: alive
grub-core/fs/zfs/zfs.c:2505:zfs: looking for 'root_dataset'
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, d00178
grub-core/fs/zfs/zfs.c:2515:zfs: zap read
grub-core/fs/zfs/zfs.c:2528:zfs: fat zap
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, b09900
grub-core/fs/zfs/zfs.c:2288:zfs: fzap: length 13
grub-core/fs/zfs/zfs.c:2532:zfs: returned 0
grub-core/fs/zfs/zfs.c:3118:zfs: alive
grub-core/fs/zfs/zfs.c:2694:zfs: endian = 1, blkid=1
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2062:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 131072/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, c0b8f0
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, 160f8c0
grub-core/fs/zfs/zfs.c:2699:zfs: alive
grub-core/fs/zfs/zfs.c:3124:zfs: alive
grub-core/fs/zfs/zfs.c:3302:zfs: alive
grub-core/fs/zfs/zfs.c:3306:zfs: endian = 0
grub-core/fs/zfs/zfs.c:3315:zfs: endian = 1
grub-core/fs/zfs/zfs.c:3170:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 4096/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, 300018
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:3395:zfs: endian = 1
grub-core/fs/zfs/zfs.c:3170:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 0/512
grub-core/kern/fs.c:79:fs: Fehler: compression algorithm inherit not supported
.
grub-core/kern/fs.c:80:fs: zfs detection failed.
grub-install: Fehler: compression algorithm inherit not supported

Following lines seem to bail out:

  if (comp != ZIO_COMPRESS_OFF && decomp_table[comp].decomp_func == NULL)
    return grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET,
               "compression algorithm %s not supported\n", decomp_table[comp].name)

The thing is: "inherit" is not a compression type per se, but says "use whatever its parent (or parent-parent or ...) is using).

So, it has no decompress method:

static decomp_entry_t decomp_table[ZIO_COMPRESS_FUNCTIONS] = { {"inherit", NULL}, / ZIO_COMPRESS_INHERIT /

I think the missing piece that grub does not look up the parents if inherit.

mifritscher commented 6 months ago

I can confirm that using grub 2.12 does indeed help. So one of the top 4 commits of https://git.savannah.gnu.org/cgit/grub.git/log/grub-core/fs/zfs/zfs.c solves the problem. If I need to guess I would say "ZFS: Don't iterate over null objsets".

timkgh commented 6 months ago

I can confirm that using grub 2.12 does indeed help. So one of the top 4 commits of https://git.savannah.gnu.org/cgit/grub.git/log/grub-core/fs/zfs/zfs.c solves the problem. If I need to guess I would say "ZFS: Don't iterate over null objsets".

How does one upgrade grub in Ubuntu 22.04 LTS in particular when I can't boot at all? I don't understand how it broke all of a sudden, I've been taking snapshots for years (with sanoid).

mifritscher commented 6 months ago

How does one upgrade grub in Ubuntu 22.04 LTS in particular when I can't boot at all? I don't understand how it broke all of a sudden, I've been taking snapshots for years (with sanoid).

You can e.g. start a live version, install zfs drivers and import bpool. Then you can make an USB stick with grub and copy your kernel + initrd + the grub config needed to start this on it. Then you can boot your installation with it (don't forget to export bpool before ;)

Another way is to import both bpool and rpool, mount it together, bindmount dev, run, sys and proc and chroot it. I used this way on debian bookworm.

Either way, you can either try to install grub 2.12 packages from newer distro versions, or build grub manually (it isn't too complicated, just ensure that you have the zfs and devicemapper libs installed,the configure script will tell you if you have...) and, if you use uefi boot, use the --with-platform=efi . Then, a grub-install does the job and you have (hopefully) a bootable system again.

timkgh commented 6 months ago

I managed to boot it for now using portable ZfsBootMenu running from a flash drive. But still unclear what I should do to fix Ubuntu 22.04. I don't have a bpool, this is an old setup from the 18.04 days, there's a single rpool that contains /boot. I'm thinking now that it was a bad idea and I should either move it to its own bpool or just give up and make an ext4 partition for /boot and avoid future grub issues.

n0099 commented 6 months ago

https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/2041739/comments/9

zpool create \
    -o feature@extensible_dataset=disabled \
    -o feature@bookmarks=disabled \
    -o feature@filesystem_limits=disabled \
    -o feature@large_blocks=disabled \
    -o feature@large_dnode=disabled \
    -o feature@sha512=disabled \
    -o feature@skein=disabled \
    -o feature@edonr=disabled \
    -o feature@userobj_accounting=disabled \
    -o feature@encryption=disabled \
    -o feature@project_quota=disabled \
    -o feature@obsolete_counts=disabled \
    -o feature@bookmark_v2=disabled \
    -o feature@redaction_bookmarks=disabled \
    -o feature@redacted_datasets=disabled \
    -o feature@bookmark_written=disabled \
    -o feature@livelist=disabled \
    -o feature@zstd_compress=disabled \
    -o feature@zilsaxattr=disabled \
    -o feature@head_errlog=disabled \
    -o feature@blake3=disabled \
    -o feature@vdev_zaps_v2=disabled \
[...]

Enabling any of the features in the command above will cause grub not to recognize /boot as zfs again when a snapshot is created on bpool.

timkgh commented 6 months ago

FWIW, I fixed my issues by moving to ZFSBootMenu and couldn't be happier. Excellent piece of software to pair with ZFS!

dannyp777 commented 6 months ago

From what I can tell, this bug may have been around for up to 7 yrs. I encountered it when upgrading to Ubuntu Mantic Nov/2023 here: https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/2041739 using grub2 package version: 2.12~rc1-10ubuntu4

I resolved the problem on my Ubuntu Mantic system by deleting the bpool and recreating with following zfs options:

zpool create -d \
-o compatibility=grub2,ubuntu-22.04 \
-O devices=off \
bpool \
/dev/sda3

Mathias Aerts identified that the culperate zfs flag is feature@extensible_dataset and has had success using a bpool created with all extra features disabled. I am not sure how it came to be that an incomaptible flag was enabled on the bpool in the first place, but it seemed to be related to the snapshot process. He had been using sanoid, I had been using zfs-auto-snapshot. Now I am using zsys without any problems so far.

Maybe update-grub/grub-probe should check the zfs version/flags/features before trying to do anything?

bghira commented 6 months ago

this is correct ^

marmeladapk commented 6 months ago

The solution with disabling all features (except those not supported by the kernel) listed in @n0099 post works for me so far.

I advise against just installing grub 2.12 to your boot environment unless you also update grub packages in your system to 2.12. Old grub tools (<2.12) in your system won't be able to detect fs_uuid of bpool and won't properly generate menu entries. This will bork your boot menu once again the next time there's a new kernel version when entries are regenerated.

bghira commented 6 months ago

creating a new bpool isnt a solution, it is a workaround, and a very poor one at that.

seeing how this isnt actually a grub issue, it should likely be receiving more attention @pcd1193182

mifritscher commented 6 months ago

@bghira : Hmm? It is actually a grub issue. Between grub 2.12 rc1 and grub 2.12 release there were 4 bug fixes ( https://git.savannah.gnu.org/cgit/grub.git/log/?h=grub-2.12&qt=grep&q=zfs , I mean the 4 from 2023-09-18). It is left for someone else as exercise to bisect which one is the bug fix for the problem here.

Probably grub traps over one of these bugs when the pool gets used (completely legal) in certain ways.

Yes, installing 2.12 in the boot environment is a one time hack. But e.g. debian sid has 2.12 , which can be installed in bookworm as well without much hassle. I did exactly this in my case ;-)

SimonBard commented 5 months ago

I resolved the problem on my Ubuntu Mantic system by deleting the bpool and recreating with following zfs options:

zpool create -d \
-o compatibility=grub2,ubuntu-22.04 \
-O devices=off \
bpool \
/dev/sda3

Ok so you destroyed the bpool and created it with the parameters. How do I have to go on now? I have to create a bootloader, how can I achieve this?

R8s6 commented 5 months ago

I resolved the problem on my Ubuntu Mantic system by deleting the bpool and recreating with following zfs options:

zpool create -d \
-o compatibility=grub2,ubuntu-22.04 \
-O devices=off \
bpool \
/dev/sda3

Ok so you destroyed the bpool and created it with the parameters. How do I have to go on now? I have to create a bootloader, how can I achieve this?

which OS are you on (i.e. arch, ubuntu, fedora, etc)?

The general idea is to create the zfs bpool, chroot into the OS (from an external USB drive), install a bootloader (usually grub, but could be something else), re-generate initrmfs, probably need to re-install kernels (and microcode) as well.

I know how to do it on Arch, so let me know if you're on Arch or Arch based OS, i can write you the exact steps.

cheers.

SimonBard commented 5 months ago

which OS are you on (i.e. arch, ubuntu, fedora, etc)?

The general idea is to create the zfs bpool, chroot into the OS (from an external USB drive), install a bootloader (usually grub, but could be something else), re-generate initrmfs, probably need to re-install kernels (and microcode) as well.

I know how to do it on Arch, so let me know if you're on Arch or Arch based OS, i can write you the exact steps.

cheers.

Many thanks! I am on ubuntu.

I tried steps as described here, but already fail at the first command of Step 5 with

grub-probe /boot
grub-probe: Achtung: Platte existiert nicht, ersatzweise wird Partition des Geräts /dev/sdb4 verwendet
grub-probe: Achtung: Platte existiert nicht, ersatzweise wird Partition des Geräts /dev/sdb4 verwendet
grub-probe: Achtung: Platte existiert nicht, ersatzweise wird Partition des Geräts /dev/sdb4 verwendet
grub-probe: Fehler: Laufwerk >hostdisk//dev/sdb4< wurde nicht gefunden

Translation:

grub-probe: Attention: disk does not exist, partition of the device /dev/sdb4 is used instead
grub-probe: error: disk >hostdisk//dev/sdb4< was not found
R8s6 commented 5 months ago

That guide was ambiguous when a command is run as root (#) or a regular user ($), in this case, grub-probe should be run as root or with sudo privilege, so could you please try this:

If you're under a non-root user, please try: $ sudo grub-probe /boot

Or login as root:

$ su -
# grub-probe /boot

Source: https://superuser.com/questions/1195918/grub-probe-warning-disk-does-not-exist-so-falling-back-to-partition-device-d

SimonBard commented 5 months ago

Many thanks!

I have followed this guide and it worked. I do not want to break it right now.

nickcmaynard commented 5 months ago

After installing noble's 2.12 packages into a mantic install, grub-install and update-grub, and snapshotting bpool, my boot environment is now broken. I would respectfully suggest that a 2.12 install may not be the fix that we are hoping it may be. I shall recreate with the options @dannyp777 suggests, and disable extensible_dataset in addition.

dannyp777 commented 5 months ago

I resolved the problem on my Ubuntu Mantic system by deleting the bpool and recreating with following zfs options:

zpool create -d \
-o compatibility=grub2,ubuntu-22.04 \
-O devices=off \
bpool \
/dev/sda3

Ok so you destroyed the bpool and created it with the parameters. How do I have to go on now? I have to create a bootloader, how can I achieve this?

I took a copy of /boot before I deleted the bpool then just copied it back once I had recrecreated the bpool. Sorry, I hadn't meant my original reply to be a comprehensive how-to. There are more details of things people tried over at the launchpad bug report.

timkgh commented 5 months ago

Using ZFSBootMenu is another, much easier option.

ptomulik commented 4 months ago

HI, I was just struggling with this issue on Debian bookworm (grub-efi (2.06-13+deb12u1)).

I've recreated my boot pool with -O compatiblity=grub2, reinstalled grub, but this didn't help (no boot after first snapshot made by zfs-auto-snapshot).

Then I've recreated the pool again and installed manually grub-efi from backports:

apt install grub-efi/bookworm-backports

which installed 2.12-1~bpo12+1 version of the package (and updated dependencies appropriatelly).

My OS survived two restarts, one before snapshot and another after one snapshot. Still testing and observing...

amotin commented 4 months ago

@ptomulik See #15909. grub2 config appeared to be incompatible with earlier grub versions due to a bug fixed in 2.12. That PR introduces separate grub-2.06 config specifically for this problem. We haven't decided what to do with grub2 to not break or annoy existing users.

SimonBard commented 4 months ago

HI, I was just struggling with this issue on Debian bookworm (grub-efi (2.06-13+deb12u1)).

I've recreated my boot pool with -O compatiblity=grub2, reinstalled grub, but this didn't help (no boot after first snapshot made by zfs-auto-snapshot).

Then I've recreated the pool again and installed manually grub-efi from backports:

apt install grub-efi/bookworm-backports

which installed 2.12-1~bpo12+1 version of the package (and updated dependencies appropriatelly).

My OS survived two restarts, one before snapshot and another after one snapshot. Still testing and observing...

Interesting, I am getting this message when trying to install "grub-efi/bookworm-backports:

$ sudo apt install grub-efi/bookworm-backports
Paketlisten werden gelesen… Fertig
Abhängigkeitsbaum wird aufgebaut… Fertig
Statusinformationen werden eingelesen… Fertig
Paket grub-efi ist nicht verfügbar, wird aber von einem anderen Paket
referenziert. Das kann heißen, dass das Paket fehlt, dass es abgelöst
wurde oder nur aus einer anderen Quelle verfügbar ist.
Doch die folgenden Pakete ersetzen es:
  grub-common grub-common:i386 grub-efi-ia32-bin grub-efi-ia32

E: Veröffentlichung »bookworm-backports« für »grub-efi« konnte nicht gefunden werden.

Deepl translates this to:

$ sudo apt install grub-efi/bookworm-backports
Package lists are read... Done
Dependency tree is built... Done
Status information is read... Done
Package grub-efi is not available, but is referenced by another package
referenced by another package. This may mean that the package is missing, that it has been replaced
or is only available from another source.
However, the following packages replace it:
 grub-common grub-common:i386 grub-efi-ia32-bin grub-efi-ia32

E: Publication "bookworm-backports" for "grub-efi" could not be found.

Translated with DeepL.com (free version)
ptomulik commented 4 months ago

@SimonBard Do you have appropriate apt sources in your apt config, as explained here?

SimonBard commented 4 months ago

@SimonBard Do you have appropriate apt sources in your apt config, as explained here?

Sorry, my bad. I did not associate bookworm with debian.

I am using ubuntu 22.04

Should I install grub from live-dvd? I am using zfs-boot-menu from usb stick atm: https://docs.zfsbootmenu.org/en/v2.3.x/general/portable.html

ptomulik commented 4 months ago

@SimonBard I haven't tried it by myself, but it looks like using backports on Ubuntu is pretty simillar to using backports on Debian.

https://help.ubuntu.com/community/UbuntuBackports

Sad news is that probably grub-efi is missing from jammy-backports

https://packages.ubuntu.com/search?suite=jammy-backports&keywords=grub-efi

mabra commented 3 months ago

I still encounter this error running Arch with Grub version 2:2.12rc1-5.

I had to destroy the pool and recreate one, then disable snapshotting on the "boot" pool as a temporary workaround.

Alternatively, one can take snapshots of the datasets but not the pool.

i.e. If you're using sanoid, instead of using recursive = yes, you can use recursive = zfs

Could you probably explain?

Alternatively, one can take snapshots of the datasets but not the pool.

There is no pool without the toplevel filesystem ,so I do not understand i! Thanks.

R8s6 commented 3 months ago

please ignore the sanoid part, i just did another experiment, and it was not accurate, so i just crossed it out from the original comment.

About taking the snapshots of datasets vs pool, i meant this:

Let's say you have zboot as the pool, with zboot/boot and zboot/boot/default being its datasets, with zboot/boot/default mounted as /boot, something like this:

zboot                250M   614M    24K  none
zboot/boot           238M   614M    24K  none
zboot/boot/default   238M   614M  70.2M  /boot

A a temp workaround, please try only taking snapshots of zboot/boot and/or zboot/boot/default, but not the top-level zboot, otherwise grub would fail.

zhouska commented 2 months ago

I resolved the problem on my Ubuntu Mantic system by deleting the bpool and recreating with following zfs options:

zpool create -d \
-o compatibility=grub2,ubuntu-22.04 \
-O devices=off \
bpool \
/dev/sda3

Ok so you destroyed the bpool and created it with the parameters. How do I have to go on now? I have to create a bootloader, how can I achieve this?

I took a copy of /boot before I deleted the bpool then just copied it back once I had recrecreated the bpool. Sorry, I hadn't meant my original reply to be a comprehensive how-to. There are more details of things people tried over at the launchpad bug report.

Your answer is incomplete and will only create more issues.

For some reason, grub2 doesn't seem to like the idea of a root dataset. The grub shell won't be able to list any files there.

In order to mimic whatever Ubuntu installer did when the system was installed, you need to do something like this:

zfs set mountpoint=none bpool
zfs set canmount=off bpool

and then create filesystem dataset to act as a container [1]:

zfs create -o canmount=off -o mountpoint=none bpool/BOOT

and create filesystem dataset [1]:

UUID=$(dd if=/dev/urandom bs=1 count=100 2>/dev/null | tr -dc 'a-z0-9' | cut -c-6)
zfs create -o mountpoint=/boot bpool/BOOT/ubuntu_$UUID

only then you can mount it at an alternate path and restore all the files and directories from step you mentioned above.

Once done, you can verify it from within chroot environment with:

grub-probe /boot

It should list zfs as the filesystem.

As a last thing, you need to tweak the new bpool mountpoint by changing existing or re-creating the zfs-list.cache file (you already have one pointing to old mount point) [2]:

mkdir /etc/zfs/zfs-list.cache
touch /etc/zfs/zfs-list.cache/bpool

# enable the tracking ZEDLET
systemctl enable zfs-zed.service
systemctl restart zfs-zed.service

# trigger cache refresh
zfs set relatime=off main/secure
zfs inherit relatime main/secure

# re-run systemd generators and reboot
systemctl daemon-reload

References: 1 2