openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.54k stars 1.74k forks source link

zpool clear strangeness #12090

Open ghoshdipak opened 3 years ago

ghoshdipak commented 3 years ago

CentOS8.2 4.18.0-193.28.1.x5.0.22.x86_64 modinfo zfs | grep -iw version version: 2.1.0-x.5.0_520_ge8c59ac5 modinfo spl | grep -iw version version: 2.1.0-x.5.0_520_ge8c59ac5

Describe the problem you're observing

Intentional damage (zero'd out the entire device) was done on a single vdev of a draid2 pool with 2 spares. After running IO to the mounted file system on this pool, the pool became degraded, as expected. Distributed spare got engaged and resilver started. Scrub finished with zero bytes repaired. Issuing 'zpool clear' brings the DEGRADED pool and vdev to ONLINE state. If we run IO or scrub the pool becomes degraded again.

Describe how to reproduce the problem

echo 1 > /sys/module/zfs/parameters/zfs_prefetch_disable cat /sys/module/zfs/parameters/zfs_prefetch_disable 1 truncate -s 1G file1 file2 file3 file4 file5 file6 file7 file8 file9 file10 file11 file12 file13 file14 file15 pwd /root/test/files zpool create -f -o cachefile=none -o failmode=panic -O canmount=off pool-oss0 draid2:11d:15c:2s /root/test/files/file1 /root/test/files/file2 /root/test/files/file3 /root/test/files/file4 /root/test/files/file5 /root/test/files/file6 /root/test/files/file7 /root/test/files/file8 /root/test/files/file9 /root/test/files/file10 /root/test/files/file11 /root/test/files/file12 /root/test/files/file13 /root/test/files/file14 /root/test/files/file15 zfs create -o mountpoint=/mnt/ost0 pool-oss0/ost0 [root@localhost files]# [root@localhost files]# [root@localhost files]# [root@localhost files]# fio --name=test --ioengine=libaio --fallocate=none --iodepth=4 --rw=write --bs=1M --size=8G --numjobs=1 --allow_mounted_write=1 --do_verify=0 --verify=pattern --verify_pattern=0x0A0B0C0D --filename=/mnt/ost0/testfile --group_reporting test: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=4 fio-3.7 Starting 1 process test: Laying out IO file (1 file / 8192MiB) Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=66.6MiB/s][r=0,w=66 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=10718: Wed May 19 22:26:15 2021 write: IOPS=89, BW=89.0MiB/s (93.3MB/s)(8192MiB/92043msec) slat (usec): min=206, max=2843.9k, avg=11218.74, stdev=128346.34 clat (usec): min=3, max=2848.3k, avg=33712.67, stdev=221020.14 lat (usec): min=326, max=2852.2k, avg=44932.45, stdev=254463.05 clat percentiles (usec): | 1.00th=[ 701], 5.00th=[ 750], 10.00th=[ 791], | 20.00th=[ 865], 30.00th=[ 955], 40.00th=[ 1029], | 50.00th=[ 1106], 60.00th=[ 1221], 70.00th=[ 1418], | 80.00th=[ 2212], 90.00th=[ 8979], 95.00th=[ 16909], | 99.00th=[1434452], 99.50th=[1602225], 99.90th=[2499806], | 99.95th=[2566915], 99.99th=[2835350] bw ( KiB/s): min=14307, max=411212, per=100.00%, avg=230380.26, stdev=94722.67, samples=72 iops : min= 13, max= 401, avg=224.57, stdev=92.53, samples=72 lat (usec) : 4=0.01%, 500=0.01%, 750=5.02%, 1000=31.23% lat (msec) : 2=42.64%, 4=4.21%, 10=8.28%, 20=4.36%, 50=1.94% lat (msec) : 100=0.22%, 750=0.04%, 1000=0.29% cpu : usr=0.07%, sys=3.70%, ctx=2300, majf=0, minf=13 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,8192,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs): WRITE: bw=89.0MiB/s (93.3MB/s), 89.0MiB/s-89.0MiB/s (93.3MB/s-93.3MB/s), io=8192MiB (8590MB), run=92043-92043msec [root@localhost files]# [root@localhost files]# [root@localhost files]# echo 3 > /proc/sys/vm/drop_caches [root@localhost files]# sync [root@localhost files]# [root@localhost files]# [root@localhost files]# fio --name=test --ioengine=libaio --fallocate=none --iodepth=4 --rw=read --bs=1M --size=8G --numjobs=1 --allow_mounted_write=1 --do_verify=1 --verify=pattern --verify_pattern=0x0A0B0C0D --filename=/mnt/ost0/testfile --group_reporting test: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=4 fio-3.7 Starting 1 process Jobs: 1 (f=1): [V(1)][100.0%][r=503MiB/s,w=0KiB/s][r=503,w=0 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=27966: Wed May 19 22:26:59 2021 read: IOPS=353, BW=354MiB/s (371MB/s)(8192MiB/23161msec) slat (usec): min=650, max=502000, avg=2625.01, stdev=13294.92 clat (usec): min=121, max=533083, avg=8483.66, stdev=23148.47 lat (usec): min=1190, max=534382, avg=11111.37, stdev=26851.01 clat percentiles (msec): | 1.00th=[ 4], 5.00th=[ 4], 10.00th=[ 4], 20.00th=[ 5], | 30.00th=[ 5], 40.00th=[ 6], 50.00th=[ 6], 60.00th=[ 7], | 70.00th=[ 8], 80.00th=[ 9], 90.00th=[ 12], 95.00th=[ 16], | 99.00th=[ 44], 99.50th=[ 117], 99.90th=[ 502], 99.95th=[ 506], | 99.99th=[ 535] bw ( KiB/s): min=39844, max=610304, per=99.63%, avg=360838.15, stdev=151393.14, samples=46 iops : min= 38, max= 596, avg=352.26, stdev=147.95, samples=46 lat (usec) : 250=0.01% lat (msec) : 2=0.01%, 4=19.31%, 10=66.26%, 20=11.69%, 50=1.78% lat (msec) : 100=0.37%, 250=0.34%, 500=0.10%, 750=0.12% cpu : usr=5.66%, sys=39.62%, ctx=33035, majf=0, minf=525 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=8192,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs): READ: bw=354MiB/s (371MB/s), 354MiB/s-354MiB/s (371MB/s-371MB/s), io=8192MiB (8590MB), run=23161-23161msec [root@localhost files]# [root@localhost files]# [root@localhost files]# echo 3 > /proc/sys/vm/drop_caches [root@localhost files]# [root@localhost files]# [root@localhost files]# [root@localhost files]# [root@localhost files]# dd if=/dev/zero of=file1 bs=1M count=1024 oflag=direct 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.812825 s, 1.3 GB/s [root@localhost files]# echo 3 > /proc/sys/vm/drop_caches [root@localhost files]# echo 3 > /proc/sys/vm/drop_caches [root@localhost files]# [root@localhost files]# [root@localhost files]# [root@localhost files]# fio --name=test --ioengine=libaio --fallocate=none --iodepth=4 --rw=read --bs=1M --size=8G --numjobs=1 --allow_mounted_write=1 --do_verify=1 --verify=pattern --verify_pattern=0x0A0B0C0D --filename=/mnt/ost0/testfile --group_reporting test: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=4 fio-3.7 Starting 1 process Jobs: 1 (f=1): [V(1)][100.0%][r=501MiB/s,w=0KiB/s][r=501,w=0 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=72740: Wed May 19 22:28:32 2021 read: IOPS=184, BW=185MiB/s (194MB/s)(8192MiB/44367msec) slat (usec): min=548, max=501887, avg=5117.17, stdev=16143.65 clat (usec): min=131, max=1124.1k, avg=16247.34, stdev=36912.74 lat (usec): min=1040, max=1404.9k, avg=21368.78, stdev=45991.99 clat percentiles (msec): | 1.00th=[ 3], 5.00th=[ 4], 10.00th=[ 4], 20.00th=[ 5], | 30.00th=[ 6], 40.00th=[ 7], 50.00th=[ 8], 60.00th=[ 11], | 70.00th=[ 14], 80.00th=[ 19], 90.00th=[ 33], 95.00th=[ 49], | 99.00th=[ 138], 99.50th=[ 218], 99.90th=[ 506], 99.95th=[ 726], | 99.99th=[ 1116] bw ( KiB/s): min= 2043, max=913848, per=99.76%, avg=188614.28, stdev=158911.91, samples=88 iops : min= 1, max= 892, avg=183.91, stdev=155.26, samples=88 lat (usec) : 250=0.01% lat (msec) : 2=0.01%, 4=15.56%, 10=43.88%, 20=22.18%, 50=13.67% lat (msec) : 100=3.20%, 250=1.15%, 500=0.18%, 750=0.10%, 1000=0.02% cpu : usr=3.78%, sys=21.27%, ctx=30665, majf=0, minf=524 IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=8192,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs): READ: bw=185MiB/s (194MB/s), 185MiB/s-185MiB/s (194MB/s-194MB/s), io=8192MiB (8590MB), run=44367-44367msec [root@localhost files]#

[root@localhost ~]# zpool status -v 2 pool: pool-oss0 state: ONLINE config:

    NAME                         STATE     READ WRITE CKSUM
    pool-oss0                    ONLINE       0     0     0
      draid2:11d:15c:2s-0        ONLINE       0     0     0
        /root/test/files/file1   ONLINE       0     0     0
        /root/test/files/file2   ONLINE       0     0     0
        /root/test/files/file3   ONLINE       0     0     0
        /root/test/files/file4   ONLINE       0     0     0
        /root/test/files/file5   ONLINE       0     0     0
        /root/test/files/file6   ONLINE       0     0     0
        /root/test/files/file7   ONLINE       0     0     0
        /root/test/files/file8   ONLINE       0     0     0
        /root/test/files/file9   ONLINE       0     0     0
        /root/test/files/file10  ONLINE       0     0     0
        /root/test/files/file11  ONLINE       0     0     0
        /root/test/files/file12  ONLINE       0     0     0
        /root/test/files/file13  ONLINE       0     0     0
        /root/test/files/file14  ONLINE       0     0     0
        /root/test/files/file15  ONLINE       0     0     0
    spares
      draid2-0-0                 AVAIL
      draid2-0-1                 AVAIL

errors: No known data errors

When there is no IO repeats the same status as above because the pool did not recognize the failure.

While running IO from another terminal, the following status observed:

pool: pool-oss0 state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P config:

    NAME                         STATE     READ WRITE CKSUM
    pool-oss0                    DEGRADED     0     0     0
      draid2:11d:15c:2s-0        DEGRADED     0     0     0
        /root/test/files/file1   DEGRADED     0     0 1.88K  too many errors
        /root/test/files/file2   ONLINE       0     0     0
        /root/test/files/file3   ONLINE       0     0     0
        /root/test/files/file4   ONLINE       0     0     0
        /root/test/files/file5   ONLINE       0     0     0
        /root/test/files/file6   ONLINE       0     0     0
        /root/test/files/file7   ONLINE       0     0     0
        /root/test/files/file8   ONLINE       0     0     0
        /root/test/files/file9   ONLINE       0     0     0
        /root/test/files/file10  ONLINE       0     0     0
        /root/test/files/file11  ONLINE       0     0     0
        /root/test/files/file12  ONLINE       0     0     0
        /root/test/files/file13  ONLINE       0     0     0
        /root/test/files/file14  ONLINE       0     0     0
        /root/test/files/file15  ONLINE       0     0     0
    spares
      draid2-0-0                 AVAIL
      draid2-0-1                 AVAIL

errors: No known data errors

..............................

pool: pool-oss0 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver (draid2:11d:15c:2s-0) in progress since Wed May 19 22:27:49 2021 9.75G scanned at 714M/s, 8.31G issued 608M/s, 9.75G total 655M resilvered, 100.00% done, 00:00:00 to go config:

    NAME                          STATE     READ WRITE CKSUM
    pool-oss0                     DEGRADED     0     0     0
      draid2:11d:15c:2s-0         DEGRADED     0     0     0
        spare-0                   DEGRADED     0     0 4.72K
          /root/test/files/file1  DEGRADED     0     0 2.23K  too many errors
          draid2-0-0              ONLINE       0     0     0  (resilvering)
        /root/test/files/file2    ONLINE       0     0     0  (resilvering)
        /root/test/files/file3    ONLINE       0     0     0  (resilvering)
        /root/test/files/file4    ONLINE       0     0     0  (resilvering)
        /root/test/files/file5    ONLINE       0     0     0  (resilvering)
        /root/test/files/file6    ONLINE       0     0     0  (resilvering)
        /root/test/files/file7    ONLINE       0     0     0  (resilvering)
        /root/test/files/file8    ONLINE       0     0     0  (resilvering)
        /root/test/files/file9    ONLINE       0     0     0  (resilvering)
        /root/test/files/file10   ONLINE       0     0     0  (resilvering)
        /root/test/files/file11   ONLINE       0     0     0  (resilvering)
        /root/test/files/file12   ONLINE       0     0     0  (resilvering)
        /root/test/files/file13   ONLINE       0     0     0  (resilvering)
        /root/test/files/file14   ONLINE       0     0     0  (resilvering)
        /root/test/files/file15   ONLINE       0     0     0  (resilvering)
    spares
      draid2-0-0                  INUSE     currently in use
      draid2-0-1                  AVAIL

errors: No known data errors

pool: pool-oss0 state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P scan: scrub in progress since Wed May 19 22:28:03 2021 9.75G scanned at 4.88G/s, 855M issued at 428M/s, 9.75G total 0B repaired, 8.56% done, 00:00:21 to go scan: resilvered (draid2:11d:15c:2s-0) 655M in 00:00:14 with 0 errors on Wed May 19 22:28:03 2021 config:

    NAME                          STATE     READ WRITE CKSUM
    pool-oss0                     DEGRADED     0     0     0
      draid2:11d:15c:2s-0         DEGRADED     0     0     0
        spare-0                   DEGRADED     0     0 4.74K
          /root/test/files/file1  DEGRADED     0     0 2.23K  too many errors
          draid2-0-0              ONLINE       0     0     0
        /root/test/files/file2    ONLINE       0     0     0
        /root/test/files/file3    ONLINE       0     0     0
        /root/test/files/file4    ONLINE       0     0     0
        /root/test/files/file5    ONLINE       0     0     0
        /root/test/files/file6    ONLINE       0     0     0
        /root/test/files/file7    ONLINE       0     0     0
        /root/test/files/file8    ONLINE       0     0     0
        /root/test/files/file9    ONLINE       0     0     0
        /root/test/files/file10   ONLINE       0     0     0
        /root/test/files/file11   ONLINE       0     0     0
        /root/test/files/file12   ONLINE       0     0     0
        /root/test/files/file13   ONLINE       0     0     0
        /root/test/files/file14   ONLINE       0     0     0
        /root/test/files/file15   ONLINE       0     0     0
    spares
      draid2-0-0                  INUSE     currently in use
      draid2-0-1                  AVAIL

errors: No known data errors

[root@localhost ~]# zpool clear pool-oss0 [root@localhost ~]# zpool status -v pool: pool-oss0 state: ONLINE scan: scrub repaired 0B in 00:00:20 with 0 errors on Wed May 19 22:28:23 2021 scan: resilvered (draid2:11d:15c:2s-0) 655M in 00:00:14 with 0 errors on Wed May 19 22:28:03 2021 config:

    NAME                         STATE     READ WRITE CKSUM
    pool-oss0                    ONLINE       0     0     0
      draid2:11d:15c:2s-0        ONLINE       0     0     0
        /root/test/files/file1   ONLINE       0     0     0
        /root/test/files/file2   ONLINE       0     0     0
        /root/test/files/file3   ONLINE       0     0     0
        /root/test/files/file4   ONLINE       0     0     0
        /root/test/files/file5   ONLINE       0     0     0
        /root/test/files/file6   ONLINE       0     0     0
        /root/test/files/file7   ONLINE       0     0     0
        /root/test/files/file8   ONLINE       0     0     0
        /root/test/files/file9   ONLINE       0     0     0
        /root/test/files/file10  ONLINE       0     0     0
        /root/test/files/file11  ONLINE       0     0     0
        /root/test/files/file12  ONLINE       0     0     0
        /root/test/files/file13  ONLINE       0     0     0
        /root/test/files/file14  ONLINE       0     0     0
        /root/test/files/file15  ONLINE       0     0     0
    spares
      draid2-0-0                 AVAIL
      draid2-0-1                 AVAIL

errors: No known data errors

When IO ran from another terminal the following status observed:

[root@localhost ~]# zpool status -v pool: pool-oss0 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: scrub repaired 0B in 00:00:20 with 0 errors on Wed May 19 22:28:23 2021 scan: resilver (draid2:11d:15c:2s-0) in progress since Wed May 19 22:29:00 2021 2.21G scanned at 693M/s, 1.83G issued 575M/s, 9.75G total 100M resilvered, 22.65% done, 00:00:11 to go config:

    NAME                          STATE     READ WRITE CKSUM
    pool-oss0                     DEGRADED     0     0     0
      draid2:11d:15c:2s-0         DEGRADED     0     0     0
        spare-0                   DEGRADED     0     0 1.20K
          /root/test/files/file1  DEGRADED     0     0 1.04K  too many errors
          draid2-0-0              ONLINE       0     0     0  (resilvering)
        /root/test/files/file2    ONLINE       0     0     0
        /root/test/files/file3    ONLINE       0     0     0  (resilvering)
        /root/test/files/file4    ONLINE       0     0     0
        /root/test/files/file5    ONLINE       0     0     0
        /root/test/files/file6    ONLINE       0     0     0  (resilvering)
        /root/test/files/file7    ONLINE       0     0     0  (resilvering)
        /root/test/files/file8    ONLINE       0     0     0  (resilvering)
        /root/test/files/file9    ONLINE       0     0     0  (resilvering)
        /root/test/files/file10   ONLINE       0     0     0  (resilvering)
        /root/test/files/file11   ONLINE       0     0     0  (resilvering)
        /root/test/files/file12   ONLINE       0     0     0
        /root/test/files/file13   ONLINE       0     0     0  (resilvering)
        /root/test/files/file14   ONLINE       0     0     0  (resilvering)
        /root/test/files/file15   ONLINE       0     0     0  (resilvering)
    spares
      draid2-0-0                  INUSE     currently in use
      draid2-0-1                  AVAIL

errors: No known data errors

stale[bot] commented 2 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

ghoshdipak commented 2 years ago

We are still experiencing the same issue with OpenZFS 2.1.4

behlendorf commented 2 years ago

Thanks for retesting this with OpenZFS 2.1.4, I'll look in to what's exactly going on here. Somewhat related to this you might want to look at PR #13499 which slightly changes the ZED behavior. This fix was queued up for 2.1.5.

akashb-22 commented 2 years ago

The same issue is seen when tried with the PR:13499 zpool_clear.txt

behlendorf commented 2 years ago

@akashb-22 would you mind verifying the fix in #13555 resolves this issue.

akashb-22 commented 2 years ago
  1. Case 1: The Issue on damage to a single vdev is working fine with the fix.
  2. Case 2: When the same issue is tested when damage is done to 2 child vdevs on draid2, permanent errors are reported on zpool status -v, but the actual data is correct(verified using md5sum).
  3. Case 3: When the same issue is tested when damage is done to 2 child vdevs on draid2, the system PANICs and goes into a weird state. Anyone of case2 or case3 is observed when choosing 2 different child vdevs. The same is not observed on raidz2. zpool_clear1.txt
behlendorf commented 2 years ago

Thanks for the quick turn around. I see what's going on with case 2 and 3 and will work on updating the PR to take a slightly different approach.

behlendorf commented 2 years ago

I've updated the PR. Would you mind testing the updated patch.

akashb-22 commented 2 years ago

Case 1: Damage to a single vdev is working fine with the fix. Case 2: Damage to 2 child vdevs on draid2 multiple times along with zpool clear is working fine with the fix. Case3: Damage to 2 child vdevs simultaneously one after other on draid2 + 2 dspare results in all the child vdevs going to DEGRADED state and eventually leads to data loss, but after zpool-clear and a scrub brings the zpool to ONLINE state and all the data files are present in the filesystem with no loss. This case works fine with raidz2. Also, this error is seen on zpool status momentarily during the tests(case3):

munmap_chunk(): invalid pointer
Aborted (core dumped)

zpool_clear2.txt

behlendorf commented 2 years ago

That third test case really is a pretty nasty absolute worst case scenario. I was able to reproduce it locally but it'll take me a while to get to a root cause. In the meanwhile, let's move forward with the current PR which at least sorts out those first two test cases.

akashb-22 commented 7 months ago

@behlendorf Another case is observed where after the rebuild once the scrub is in-progress on issuing a zpool_clear on DEBUG builds panic at this point. however, I do not see any issue in the normal builds. I have a testcase where it is 100% reproducible. This is seen in zfs-2.1-release / zfs-2.2-release and master. stacktrace:

[Tue Feb 13 00:55:35 2024] VERIFY(spa_config_held(os->os_spa, SCL_ALL, RW_WRITER) == 0 || (spa_is_root(os->os_spa) && spa_config_held(os->os_spa, SCL_STATE, RW_WRITER))) failed
[Tue Feb 13 00:55:35 2024] PANIC at dnode.c:1422:dnode_hold_impl()
[Tue Feb 13 00:55:35 2024] Showing stack for process 186367
[Tue Feb 13 00:55:35 2024] CPU: 4 PID: 186367 Comm: zpool Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-372.9.1.el8.x86_64 #1
[Tue Feb 13 00:55:35 2024] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.10.2-1ubuntu1 04/01/2014
[Tue Feb 13 00:55:35 2024] Call Trace:
[Tue Feb 13 00:55:35 2024]  dump_stack+0x41/0x60
[Tue Feb 13 00:55:35 2024]  spl_panic+0xd0/0xe8 [spl]
[Tue Feb 13 00:55:35 2024]  ? virtqueue_add+0xace/0xb40
[Tue Feb 13 00:55:35 2024]  dnode_hold_impl+0x755/0x11c0 [zfs]
[Tue Feb 13 00:55:35 2024]  ? cpumask_next_and+0x1a/0x20
[Tue Feb 13 00:55:35 2024]  ? update_sd_lb_stats.constprop.121+0xca/0x830
[Tue Feb 13 00:55:35 2024]  dmu_buf_hold_noread+0x37/0x140 [zfs]
[Tue Feb 13 00:55:35 2024]  dmu_buf_hold+0x37/0x80 [zfs]
[Tue Feb 13 00:55:35 2024]  zap_lockdir+0x54/0x110 [zfs]
[Tue Feb 13 00:55:35 2024]  ? find_busiest_group+0x41/0x360
[Tue Feb 13 00:55:35 2024]  zap_lookup_norm+0x5d/0xd0 [zfs]
[Tue Feb 13 00:55:35 2024]  zap_lookup+0x12/0x20 [zfs]
[Tue Feb 13 00:55:35 2024]  zfs_dirent_lock+0x5b5/0x7c0 [zfs]
[Tue Feb 13 00:55:35 2024]  zfs_dirlook+0x8a/0x300 [zfs]
[Tue Feb 13 00:55:35 2024]  ? zfs_zaccess+0x21e/0x440 [zfs]
[Tue Feb 13 00:55:35 2024]  zfs_lookup+0x24b/0x400 [zfs]
[Tue Feb 13 00:55:35 2024]  zpl_lookup+0xc2/0x290 [zfs]
[Tue Feb 13 00:55:35 2024]  __lookup_slow+0x97/0x150
[Tue Feb 13 00:55:35 2024]  lookup_slow+0x35/0x50
[Tue Feb 13 00:55:35 2024]  walk_component+0x1bf/0x2f0
[Tue Feb 13 00:55:35 2024]  ? legitimize_path.isra.42+0x2d/0x60
[Tue Feb 13 00:55:35 2024]  path_lookupat.isra.47+0x75/0x200
[Tue Feb 13 00:55:35 2024]  ? filename_lookup.part.61+0xe0/0x170
[Tue Feb 13 00:55:35 2024]  ? kmem_cache_free+0x116/0x300
[Tue Feb 13 00:55:35 2024]  filename_lookup.part.61+0xa0/0x170
[Tue Feb 13 00:55:35 2024]  ? __switch_to_asm+0x35/0x70
[Tue Feb 13 00:55:35 2024]  ? getname_kernel+0x2c/0x100
[Tue Feb 13 00:55:35 2024]  ? kmem_cache_alloc+0x13f/0x280
[Tue Feb 13 00:55:35 2024]  ? vdev_draid_open_spares+0x30/0x30 [zfs]
[Tue Feb 13 00:55:35 2024]  ? vdev_draid_open_spares+0x30/0x30 [zfs]
[Tue Feb 13 00:55:35 2024]  lookup_bdev.part.46+0x28/0xa0
[Tue Feb 13 00:55:35 2024]  zvol_is_zvol_impl+0xb/0x40 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_uses_zvols+0x18/0x70 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_uses_zvols+0x3d/0x70 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_uses_zvols+0x3d/0x70 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_open_children_impl+0xaa/0x120 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_draid_open+0x63/0x1f0 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_open+0xe4/0xa10 [zfs]
[Tue Feb 13 00:55:35 2024]  ? mutex_lock+0xe/0x30
[Tue Feb 13 00:55:35 2024]  vdev_reopen+0x4e/0x1f0 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_clear+0x188/0x2b0 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_clear+0x87/0x2b0 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_clear+0x87/0x2b0 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_clear+0x87/0x2b0 [zfs]
[Tue Feb 13 00:55:35 2024]  zfs_ioc_clear+0x18b/0x380 [zfs]
[Tue Feb 13 00:55:35 2024]  ? strlcpy+0x2d/0x40
[Tue Feb 13 00:55:35 2024]  zfsdev_ioctl_common+0x656/0x980 [zfs]
[Tue Feb 13 00:55:35 2024]  ? __kmalloc_node+0x26e/0x2a0
[Tue Feb 13 00:55:35 2024]  ? __handle_mm_fault+0x7a6/0x7e0
[Tue Feb 13 00:55:35 2024]  zfsdev_ioctl+0x4f/0xe0 [zfs]
[Tue Feb 13 00:55:35 2024]  do_vfs_ioctl+0xa4/0x680
[Tue Feb 13 00:55:35 2024]  ? handle_mm_fault+0xc1/0x1e0
[Tue Feb 13 00:55:35 2024]  ? syscall_trace_enter+0x1fb/0x2c0
[Tue Feb 13 00:55:35 2024]  ksys_ioctl+0x60/0x90
[Tue Feb 13 00:55:35 2024]  __x64_sys_ioctl+0x16/0x20
[Tue Feb 13 00:55:35 2024]  do_syscall_64+0x5b/0x1a0
[Tue Feb 13 00:55:35 2024]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[Tue Feb 13 00:55:35 2024] RIP: 0033:0x7fea9f6836db
[Tue Feb 13 00:55:35 2024] Code: 73 01 c3 48 8b 0d ad 57 38 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d 57 38 00 f7 d8 64 89 01 48
[Tue Feb 13 00:55:35 2024] RSP: 002b:00007ffd5047ce08 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Tue Feb 13 00:55:35 2024] RAX: ffffffffffffffda RBX: 00007ffd5047d250 RCX: 00007fea9f6836db
[Tue Feb 13 00:55:35 2024] RDX: 00007ffd5047d250 RSI: 0000000000005a21 RDI: 0000000000000003
[Tue Feb 13 00:55:35 2024] RBP: 00007ffd50480830 R08: 0000000000000040 R09: 0000000001b9721c
[Tue Feb 13 00:55:35 2024] R10: 000000000000001c R11: 0000000000000246 R12: 0000000001b5db70
[Tue Feb 13 00:55:35 2024] R13: 0000000000000000 R14: 0000000001b61bc0 R15: 00007ffd5047ce50

Further testing shows this issue is

  1. not seen in in raidz2+2s
  2. not seen in draid w/o dspares
  3. not seen in draid w/ HOT spares

Issue is always 100% hit w/ above testcase in draid2 w/ 2 dspares.