openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.66k stars 1.76k forks source link

Unable to boot system with root on luks+zfs and with spare vdev #7740

Open jinnko opened 6 years ago

jinnko commented 6 years ago

System information

Type Version/Name
Distribution Name Debian
Distribution Version Stretch
Linux Kernel 4.9.0-7-amd64
Architecture amd64
ZFS Version 0.7.9
SPL Version 0.7.9

Describe the problem you're observing

Creating a LUKS + ZFS set up with a RAID-1+0 config, plus a spare disk, doesn't boot if the spare is included in the initramfs unlocked volumes. I'm not entirely sure of how to resolve this, so raising this ticket as a starting point.

During boot the initramfs cryptroot script unlocks the disks. When doing this the script then uses blkid to determine the "TYPE" of the internal volume. Active disks in the zpool return a zfs_member type, however the spare disk does not return a type, which results in the cryptroot going down a failure path at line 352 and evicting the unlocked device.

I believe the correct way to resolve this should be for blkid to return something useful when run against the spare disk, however I'm not sure how blkid determines the "TYPE", and whether that's something rooted in the ZoL code or not.

Describe how to reproduce the problem

The build of this setup is automated and can be done with vagrant to reproduce the results - I have created everything to bootstrap the VMs and can be found at https://github.com/ixydo/vagrant-zfs-on-linux.

Otherwise you can do the following:

  1. Start with a Debian stretch instance. This is likely to be an issue on other distros but I haven't tested that.

  2. Create a zpool with a pair of mirrors like this:

    # zpool status
    pool: tank
    state: ONLINE
    scan: none requested
    config:
    
    NAME           STATE     READ WRITE CKSUM
    tank           ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        luks-zfs1  ONLINE       0     0     0
        luks-zfs3  ONLINE       0     0     0
      mirror-1     ONLINE       0     0     0
        luks-zfs2  ONLINE       0     0     0
        luks-zfs4  ONLINE       0     0     0
    spares
      luks-zfs5    AVAIL
  3. Ensure all all volumes are in the list of volumes to be unlocked at boot by checking for the initramfs option in /etc/crypttab:

# cat /etc/crypttab
# <target name> <source device>     <key file>  <options>
luks-zfs1 UUID=c9931349-3981-4bc7-8f12-5c550a621fac /etc/keys/luks/ata-Micron_5100_MTFDDAK240TCB_174719CC1445-part1 luks,discard,initramfs
luks-zfs2 UUID=45f62443-420a-4064-b7d0-cd59510f1041 /etc/keys/luks/ata-Micron_5100_MTFDDAK240TCB_174719CC146D-part1 luks,discard,initramfs
luks-zfs3 UUID=d016e052-7a73-43d0-b1c4-68eb49cad88b /etc/keys/luks/ata-Micron_5100_MTFDDAK240TCB_174719CC13EE-part1 luks,discard,initramfs
luks-zfs4 UUID=68412ade-b222-4b7a-9e56-67498547527c /etc/keys/luks/ata-Micron_5100_MTFDDAK240TCB_174719CC1494-part1 luks,discard,initramfs
luks-zfs5 UUID=d5ddbb87-c670-4856-aef3-1829db86f708 /etc/keys/luks/ata-Micron_5100_MTFDDAK240TCB_174719CB5FB3-part1 luks,discard,initramfs
  1. On reboot the unlock of the spare disk will fail. Removing the initramfs option from the spare disk config in /etc/crypttab will result in a successful boot, but the spare will be UNAVAIL.
# zpool status
  pool: tank
 state: ONLINE
  scan: none requested
config:

    NAME           STATE     READ WRITE CKSUM
    tank           ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        luks-zfs1  ONLINE       0     0     0
        luks-zfs3  ONLINE       0     0     0
      mirror-1     ONLINE       0     0     0
        luks-zfs2  ONLINE       0     0     0
        luks-zfs4  ONLINE       0     0     0
    spares
      luks-zfs5    UNAVAIL 

errors: No known data errors

Include any warning/errors/backtraces from the system logs

At boot the system drops to a shell in the initramfs and the only message it emits is the following:

cryptsetup (luks-zfs5): unknown fstype, bad password or options?
stale[bot] commented 4 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

jinnko commented 4 years ago

This is still a valid issue on Debian Stretch kernel 4.9.0-13 with OpenZFS 0.7.12-2:

$ modinfo zfs
filename:       /lib/modules/4.9.0-13-amd64/updates/dkms/zfs.ko                                                                                                                                                                                                 
version:        0.7.12-2+deb10u1~bpo9+1                                                                                                                                                                                                                         
license:        CDDL                                                                                                                                                                                                                                            
author:         OpenZFS on Linux                                                                                                                                                                                                                                
description:    ZFS                                                                                                                                                                                                                                             
srcversion:     A6D1B0339439B948E6BF693
...

To make some progress, as a first step it would help to have clarification about the TYPE returned by the blkid tool with the goal of identifying where the fix should be applied.

If it helps, this is the output of the blkid tool against the disks:

For an active volume in the pool we get output like this:

# blkid /dev/mapper/luks-zfs1 
/dev/mapper/luks-zfs1: LABEL="tank" UUID="9389091757882612535" UUID_SUB="9629288558824527693" TYPE="zfs_member"

And for the spare volume we get no output:

# blkid /dev/mapper/luks-zfs5

I would expect the spare to be identified by blkid as TYPE="zfs_member" just like any other disk in the pool.

stale[bot] commented 3 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.