openzfsonosx / zfs

OpenZFS on OS X
https://openzfsonosx.org/
Other
824 stars 72 forks source link

Detaching 1/N resilvering disks caused remaining N-1 resilver to instantly succeed without completing #780

Open josephvusich opened 3 years ago

josephvusich commented 3 years ago

TLDR: I attached a 3rd mirror to every VDEV. The new disk attached to the special VDEV was clearly bad (write errors) so I detached it. The remaining new drives instantly "completed" resilvering without error, even though the resilver should have continued for hours. Confirmed an issue by starting a manual scrub that identified millions of CKSUM errors on the disks that were incorrectly marked as resilvered.

More detailed walkthrough below. Note that mirror-0 and mirror-2 are HDDs, and special mirror-1 is comprised of SSDs.

OS/ZFS version

$ zfs version
zfs-1.9.4-0
zfs-kmod-1.9.4-0

$ sw_vers                       
ProductName:    Mac OS X
ProductVersion: 10.15.6
BuildVersion:   19G2021

Initial pool layout

    NAME           STATE     READ WRITE CKSUM
    tank           ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        media-0-0  ONLINE       0     0     0
        media-0-1  ONLINE       0     0     0
      mirror-2     ONLINE       0     0     0
        media-2-0  ONLINE       0     0     0
        media-2-1  ONLINE       0     0     0
    special 
      mirror-1     ONLINE       0     0     0
        media-1-0  ONLINE       0     0     0
        media-1-1  ONLINE       0     0     0

Adding mirrors

zpool attach tank media-1-0 /dev/disk8
zpool attach tank media-0-0 /dev/disk9
zpool attach tank media-2-0 /dev/disk10

One bad disk identified during resilver

(resilvering)

    NAME           STATE     READ WRITE CKSUM
    tank           ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        media-0-0  ONLINE       0     0     0
        media-0-1  ONLINE       0     0     0
        disk9      ONLINE       0     0     0
      mirror-2     ONLINE       0     0     0
        media-2-0  ONLINE       0     0     0
        media-2-1  ONLINE       0     0     0
        disk10     ONLINE       0     0     0
    special 
      mirror-1     ONLINE       0     0     0
        media-1-0  ONLINE       0     0     0
        media-1-1  ONLINE       0     0     0
        disk8      ONLINE       0 4.08M   326

Detach bad disk

zpool detach tank /dev/disk8

ZFS stops resilver for remaining disks without error

(no resilver in progress)

    NAME           STATE     READ WRITE CKSUM
    tank           ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        media-0-0  ONLINE       0     0     0
        media-0-1  ONLINE       0     0     0
        disk9      ONLINE       0     0     0
      mirror-2     ONLINE       0     0     0
        media-2-0  ONLINE       0     0     0
        media-2-1  ONLINE       0     0     0
        disk10     ONLINE       0     0     0
    special 
      mirror-1     ONLINE       0     0     0
        media-1-0  ONLINE       0     0     0
        media-1-1  ONLINE       0     0     0

Start scrub

zpool scrub tank

The resilver was clearly not finished

(scrub in progress)

    NAME           STATE     READ WRITE CKSUM
    tank           ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        media-0-0  ONLINE       0     0     0
        media-0-1  ONLINE       0     0     0
        disk9      ONLINE       0     0 3.08M
      mirror-2     ONLINE       0     0     0
        media-2-0  ONLINE       0     0     0
        media-2-1  ONLINE       0     0     0
        disk10     ONLINE       0     0 2.75M
    special 
      mirror-1     ONLINE       0     0     0
        media-1-0  ONLINE       0     0     0
        media-1-1  ONLINE       0     0     0