openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.43k stars 1.73k forks source link

Unable to mount root filsystem located on an encrypted pool after upgrade to zfs 2.2.2 -> "blkptr at <ADDRESS> has invalid TYPE 95" #15666

Open EvTheFuture opened 9 months ago

EvTheFuture commented 9 months ago

System information

Type Version/Name
Distribution Name Alpine Linux
Distribution Version 3.19
Kernel Version 6.6.5
Architecture x86_64
OpenZFS Version 2.2.2

Describe the problem you're observing

After upgrading to Alpine Linux 3.19 (from 3.18) during the boot process a PANIC occur with the following message: 22.1279031 PANIC: rpool: blkptr at ffffb35f54c13c00 has invalid TYPE 95 (see attached image).

After reverting back to Alpine 3.18, it boots and the boot process completes successfully and everything on the encrypted pool is accessible.

image

Comparison between Alpine 3.18 and Alpine 3.19:

Type Alpine 3.18 Alpine 3.19
ZFS 2.1.14 2.2.2
Kernel 6.1.66 6.6.5

Description of the setup: Laptop with one SSD consisting of three partitions (1, 2, and 3)

Partition 1: EFI System Partition

Partition 2: ZFS Pool This partition contains the dataset used for /boot

Partition 3: Encrypted ZFS Pool This partition contain an encrypted pool (rpool) with /root /home /var etc as datasets

Describe how to reproduce the problem

Have encrypted root and upgrade from Alpine Linux 3.18 to 3.19.

Include any warning/errors/backtraces from the system logs

Unable to access any logs due to PANIC...

EvTheFuture commented 9 months ago

The headline might be a bit wrong since it seems like it actually mounts the root filesystem but fails shortly after.

EvTheFuture commented 8 months ago

Update:

By building the kernel package (Linux kernel 6.1.69) and the package for the zfs kernel module (from zfs 2.1.14) from Alpine Linux 3.18, it works as expected, which means it completes the boot process using the root fs from the encrypted zfs root pool without the error message PANIC: rpool: blkptr at ffffb35f54c13c00 has invalid TYPE 95.

The following packages are now installed on Alpine Linux 3.19

$ apk list -I | grep -E "zfs|linux-lts-6"
linux-lts-6.1.69-r0 x86_64 {linux-lts} (GPL-2.0-only) [installed]
zfs-2.2.2-r0 x86_64 {zfs} (CDDL-1.0) [installed]
zfs-bash-completion-2.2.2-r0 x86_64 {zfs} (CDDL-1.0) [installed]
zfs-libs-2.2.2-r0 x86_64 {zfs} (CDDL-1.0) [installed]
zfs-lts-6.1.69-r1 x86_64 {zfs-lts} (CDDL-1.0) [installed]
zfs-openrc-2.2.2-r0 x86_64 {zfs} (CDDL-1.0) [installed]
zfs-udev-2.2.2-r0 x86_64 {zfs} (CDDL-1.0) [installed]
EvTheFuture commented 8 months ago

To resolve this issue i did the following:

Booted up Alpine Linux with ZFS 2.1.14 and ran zpool scrub -w which resulted in: "No known data errors" image

I then booted Alpine Linux with ZFS 2.2.2 from an USB drive and ran zpool scrub -w which resulted in: "Permanent error have been detected in the following files..." image

I then rebooted into Alpine Linux with ZFS 2.1.14 again and re-ran zpool scrub -w which again resulted in "No known data errors"

I created a new dataset on the same pool and copied the data from the dataset ZFS 2.2.2 detects as having permanent errors.

when the data was successfully copied I changed the mount point from the old dataset to the new one and rebooted Alpine Linux with ZFS 2.2.2 and this time it booted as expected.

Is there some new checks that detected a the "permanent error" correctly when using ZFS 2.2.2 or might it be a bug?