openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.53k stars 1.74k forks source link

Pool import fails when devices have the same serial number #13836

Open danderson opened 2 years ago

danderson commented 2 years ago

System information

Type Version/Name
Distribution Name NixOS (linux)
Distribution Version 22.05.20220902.67e4507 (Quokka)
Kernel Version 5.15.64
Architecture x86_64
OpenZFS Version 2.1.5-1

Describe the problem you're observing

My offsite backup storage is a VM that has whole physical drives mapped into it. Due to (I presume) a hypervisor configuration error on the host machine, all 3 drives in my array report the same device serial number:

# for i in b c d; do echo -n "/dev/vd$i: "; udevadm info --query=all --name=/dev/vd$i | grep SERIAL; done
/dev/vdb: E: ID_SERIAL=ST16000NM001G-2K_ZL2
/dev/vdc: E: ID_SERIAL=ST16000NM001G-2K_ZL2
/dev/vdd: E: ID_SERIAL=ST16000NM001G-2K_ZL2

I can create a zpool with these devices just fine:

# zpool create -f data raidz1 vdb vdc vdd
# zpool status
  pool: data
 state: ONLINE
config:

    NAME        STATE     READ WRITE CKSUM
    data        ONLINE       0     0     0
      raidz1-0  ONLINE       0     0     0
        vdb     ONLINE       0     0     0
        vdc     ONLINE       0     0     0
        vdd     ONLINE       0     0     0

errors: No known data errors

The dataset works fine at this point: reads, writes, scrubs, all happy. However, upon reboot:

# zpool status
  pool: data
 state: ONLINE
config:

    NAME                             STATE     READ WRITE CKSUM
    data                             ONLINE       0     0     0
      raidz1-0                       ONLINE       0     0     0
        vdb                          ONLINE       0     0     0
        vdc                          ONLINE       0     0     0
        virtio-ST16000NM001G-2K_ZL2  ONLINE       0     0     0

errors: No known data errors

Still fine, but note that vdd has now taken on its serial number name in zpool output. In the race to own the /dev/disk/by-id symlink for the shared serial, vdd won on this particular boot. Even more worrying, linux seems to have let different drives win the race for different partition symlinks:

# ls -l /dev/disk/by-id/virtio*
lrwxrwxrwx 1 root root  9 Sep  2 22:26 /dev/disk/by-id/virtio-ST16000NM001G-2K_ZL2 -> ../../vdd
lrwxrwxrwx 1 root root 10 Sep  2 22:26 /dev/disk/by-id/virtio-ST16000NM001G-2K_ZL2-part1 -> ../../vdd1
lrwxrwxrwx 1 root root 10 Sep  2 22:26 /dev/disk/by-id/virtio-ST16000NM001G-2K_ZL2-part9 -> ../../vdb9

Notice that the device and part1 symlink point at vdd, but the part9 symlink points at vdb.

Finally, a few more reboots until a different drive wins the serial number ownership battle, and:

# zpool status
  pool: data
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
config:

    NAME                             STATE     READ WRITE CKSUM
    data                             DEGRADED     0     0     0
      raidz1-0                       DEGRADED     0     0     0
        virtio-ST16000NM001G-2K_ZL2  ONLINE       0     0     0
        vdc                          ONLINE       0     0     0
        18299319051691703009         FAULTED      0     0     0  was /dev/disk/by-id/virtio-ST16000NM001G-2K_ZL2-part1

errors: No known data errors
# ls -l /dev/disk/by-id/virtio*
lrwxrwxrwx 1 root root  9 Sep  2 22:31 /dev/disk/by-id/virtio-ST16000NM001G-2K_ZL2 -> ../../vdb
lrwxrwxrwx 1 root root 10 Sep  2 22:31 /dev/disk/by-id/virtio-ST16000NM001G-2K_ZL2-part1 -> ../../vdb1
lrwxrwxrwx 1 root root 10 Sep  2 22:31 /dev/disk/by-id/virtio-ST16000NM001G-2K_ZL2-part9 -> ../../vdb9

With even more reboots, I can get vdc to win the race, at which point the pool goes FAULTED and all hope is lost.

Now, this is obviously a very silly way to run a zpool, and the immediate fix is "don't share serial numbers between drives". However, I was surprised that this confusion was enough to break ZFS, as I'd have expected each device to have a label that would have let ZFS tell them apart and untangle the renaming confusion.

Describe how to reproduce the problem

Unsure as to how exactly the VM is configured (trying to get the libvirt config from my host now), but roughly:

Include any warning/errors/backtraces from the system logs

Checked dmesg and system journal, zfs logged nothing in either.

IvanVolosyuk commented 2 years ago

Another way to screw yourself is to dd disk to another. In that case labels will be the same, but the serial numbers will be different. I wonder how ZFS will handle this case as well :)

This raises a question - which data should be primary indicator for ZFS: zfs labels or device serial numbers. IMHO serial numbers should usually be more reliable than labels and I wouldn't be surprised if ZFS prefers them in case of uncertainty.

stale[bot] commented 1 year ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.