openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.53k stars 1.74k forks source link

MMP suspend after HDD read error during scrub #16078

Open stuartthebruce opened 6 months ago

stuartthebruce commented 6 months ago

System information

Type Version/Name
Distribution Name Rock Linux
Distribution Version 8.9
Kernel Version 4.18.0-513.24.1.el8_9
Architecture x86_64
OpenZFS Version 2.2.3

Describe the problem you're observing

pool suspended after HDD read error during a scrub

Describe how to reproduce the problem

zpool scrub && wait for HDD error

Include any warning/errors/backtraces from the system logs

This system is already running with a 10x increase in the default value of,

[root@zfs4 ~]# cat /sys/module/zfs/parameters/zfs_multihost_fail_intervals 
100

so it would be nice to find another solution rather than just increasing that further.

Apr 10 14:33:30 zfs4 kernel: sd 9:0:33:0: [sdco] tag#6842 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=6s
Apr 10 14:33:30 zfs4 kernel: sd 9:0:33:0: [sdco] tag#6842 Sense Key : Medium Error [current] [descriptor] 
Apr 10 14:33:30 zfs4 kernel: sd 9:0:33:0: [sdco] tag#6842 Add. Sense: Unrecovered read error
Apr 10 14:33:30 zfs4 kernel: sd 9:0:33:0: [sdco] tag#6842 CDB: Read(16) 88 00 00 00 00 04 c1 88 c8 18 00 00 01 d8 00 00
Apr 10 14:33:30 zfs4 kernel: blk_update_request: critical medium error, dev sdco, sector 20426836088 op 0x0:(READ) flags 0x4200 phys_seg 37 prio class 0
Apr 10 14:33:30 zfs4 kernel: blk_update_request: critical medium error, dev dm-8, sector 20426835992 op 0x0:(READ) flags 0x0 phys_seg 49 prio class 0
Apr 10 14:33:30 zfs4 kernel: zio pool=home3 vdev=/dev/disk/by-id/wwn-0x5000cca25316e978 error=61 type=1 offset=10458540027904 size=241664 flags=1074267312
Apr 10 14:33:31 zfs4 zed[3157923]: eid=120513 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=241664 offset=10458540027904 priority=4 err=61 flags=0x400804b0 delay=6276ms
Apr 10 14:33:31 zfs4 zed[3157925]: eid=120514 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540228608 priority=4 err=61 flags=0x3800b0 bookmark=324709:5301317:0:0
Apr 10 14:33:31 zfs4 zed[3157928]: eid=120515 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540224512 priority=4 err=61 flags=0x3800b0 bookmark=324709:5301317:0:1
Apr 10 14:33:31 zfs4 zed[3157931]: eid=120516 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=8192 offset=10458540216320 priority=4 err=61 flags=0x3800b0 bookmark=324709:5319451:0:1
Apr 10 14:33:31 zfs4 zed[3157933]: eid=120517 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540212224 priority=4 err=61 flags=0x3800b0 bookmark=324709:5301317:0:3
Apr 10 14:33:31 zfs4 zed[3157936]: eid=120518 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540208128 priority=4 err=61 flags=0x3800b0 bookmark=324709:5319451:0:2
Apr 10 14:33:31 zfs4 zed[3157939]: eid=120519 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540204032 priority=4 err=61 flags=0x3800b0 bookmark=324709:5319451:0:0
Apr 10 14:33:31 zfs4 zed[3157944]: eid=120520 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540199936 priority=4 err=61 flags=0x3800b0 bookmark=324709:5301317:0:2
Apr 10 14:33:31 zfs4 zed[3157947]: eid=120522 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540175360 priority=4 err=61 flags=0x3800b0 bookmark=324709:5318293:0:1
Apr 10 14:33:31 zfs4 zed[3157946]: eid=120521 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540183552 priority=4 err=61 flags=0x3800b0 bookmark=324709:5361373:0:3
Apr 10 14:33:31 zfs4 zed[3157952]: eid=120523 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540171264 priority=4 err=61 flags=0x3800b0 bookmark=324709:5318293:0:0
Apr 10 14:33:31 zfs4 zed[3157953]: eid=120524 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540167168 priority=4 err=61 flags=0x3800b0 bookmark=324709:5311264:0:0
Apr 10 14:33:31 zfs4 zed[3157958]: eid=120525 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540163072 priority=4 err=61 flags=0x3800b0 bookmark=324709:5318293:0:3
Apr 10 14:33:31 zfs4 zed[3157959]: eid=120526 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540158976 priority=4 err=61 flags=0x3800b0 bookmark=324709:5311264:0:2
Apr 10 14:33:31 zfs4 zed[3157963]: eid=120527 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540154880 priority=4 err=61 flags=0x3800b0 bookmark=275495:235754732:0:0
Apr 10 14:33:31 zfs4 zed[3157965]: eid=120528 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540130304 priority=4 err=61 flags=0x3800b0 bookmark=324709:5311264:0:1
Apr 10 14:33:31 zfs4 zed[3157970]: eid=120529 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=8192 offset=10458540142592 priority=4 err=61 flags=0x3800b0 bookmark=324709:5310264:0:0
Apr 10 14:33:31 zfs4 zed[3157974]: eid=120530 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540126208 priority=4 err=61 flags=0x3800b0 bookmark=324709:5371846:0:2
Apr 10 14:33:31 zfs4 zed[3157977]: eid=120531 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540113920 priority=4 err=61 flags=0x3800b0 bookmark=324709:5293797:0:1
Apr 10 14:33:31 zfs4 zed[3157983]: eid=120532 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=8192 offset=10458540118016 priority=4 err=61 flags=0x3800b0 bookmark=324709:5293797:0:0
Apr 10 14:33:31 zfs4 zed[3157984]: eid=120533 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540109824 priority=4 err=61 flags=0x3800b0 bookmark=324709:5304134:0:1
Apr 10 14:33:31 zfs4 zed[3157989]: eid=120534 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540105728 priority=4 err=61 flags=0x3800b0 bookmark=324709:5290178:0:3
Apr 10 14:33:31 zfs4 zed[3157991]: eid=120535 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540101632 priority=4 err=61 flags=0x3800b0 bookmark=324709:5293797:0:2
Apr 10 14:33:31 zfs4 zed[3157996]: eid=120536 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540077056 priority=4 err=61 flags=0x3800b0 bookmark=324709:5237804:0:4
Apr 10 14:33:31 zfs4 zed[3157997]: eid=120537 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540097536 priority=4 err=61 flags=0x3800b0 bookmark=324709:5304134:0:0
Apr 10 14:33:31 zfs4 zed[3158003]: eid=120539 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540089344 priority=4 err=61 flags=0x3800b0 bookmark=324709:5237804:0:1
Apr 10 14:33:31 zfs4 zed[3158002]: eid=120538 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540093440 priority=4 err=61 flags=0x3800b0 bookmark=324709:5290178:0:2
Apr 10 14:33:31 zfs4 zed[3158008]: eid=120540 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540085248 priority=4 err=61 flags=0x3800b0 bookmark=324709:5304134:0:2
Apr 10 14:33:31 zfs4 zed[3158010]: eid=120541 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540081152 priority=4 err=61 flags=0x3800b0 bookmark=324709:5304134:0:3
Apr 10 14:33:31 zfs4 zed[3158012]: eid=120542 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540068864 priority=4 err=61 flags=0x3800b0 bookmark=324709:5290178:0:0
Apr 10 14:33:31 zfs4 zed[3158016]: eid=120543 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540072960 priority=4 err=61 flags=0x3800b0 bookmark=324709:5237804:0:0
Apr 10 14:33:31 zfs4 zed[3158018]: eid=120544 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540060672 priority=4 err=61 flags=0x3800b0 bookmark=324709:5199190:0:3
Apr 10 14:33:31 zfs4 zed[3158020]: eid=120545 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540064768 priority=4 err=61 flags=0x3800b0 bookmark=324709:5199190:0:4
Apr 10 14:33:31 zfs4 zed[3158023]: eid=120546 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540056576 priority=4 err=61 flags=0x3800b0 bookmark=324709:5282403:0:1
Apr 10 14:33:31 zfs4 zed[3158027]: eid=120547 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540044288 priority=4 err=61 flags=0x3800b0 bookmark=324709:5282403:0:2
Apr 10 14:33:31 zfs4 zed[3158031]: eid=120548 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540052480 priority=4 err=61 flags=0x3800b0 bookmark=324709:5199190:0:5
Apr 10 14:33:31 zfs4 zed[3158033]: eid=120549 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540040192 priority=4 err=61 flags=0x3800b0 bookmark=324709:5173096:0:1
Apr 10 14:33:31 zfs4 zed[3158036]: eid=120550 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540036096 priority=4 err=61 flags=0x3800b0 bookmark=324709:5282403:0:0
Apr 10 14:33:31 zfs4 zed[3158037]: eid=120551 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=12288 offset=10458540257280 priority=4 err=61 flags=0x3800b0 bookmark=275495:235756067:0:0
Apr 10 14:33:31 zfs4 zed[3158040]: eid=120552 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540032000 priority=4 err=61 flags=0x3800b0 bookmark=324709:5173096:0:2
Apr 10 14:33:31 zfs4 zed[3158042]: eid=120553 class=io pool='home3' vdev=wwn-0x5000cca25316e978 size=4096 offset=10458540027904 priority=4 err=61 flags=0x3800b0 bookmark=324709:5173096:0:0
Apr 10 14:34:37 zfs4 kernel: sd 9:0:33:0: attempting task abort!scmd(0x0000000080dd3514), outstanding for 30460 ms & timeout 30000 ms
Apr 10 14:34:37 zfs4 kernel: sd 9:0:33:0: [sdco] tag#1756 CDB: Read(16) 88 00 00 00 00 04 c1 89 1f f0 00 00 03 f8 00 00
Apr 10 14:34:37 zfs4 kernel: scsi target9:0:33: handle(0x002d), sas_address(0x5000cca25316e979), phy(32)
Apr 10 14:34:37 zfs4 kernel: scsi target9:0:33: enclosure logical id(0x5000ccab04003d00), slot(41) 
Apr 10 14:34:37 zfs4 kernel: scsi target9:0:33: enclosure level(0x0000), connector name(     )
Apr 10 14:34:37 zfs4 kernel: sd 9:0:33:0: task abort: SUCCESS scmd(0x0000000080dd3514)
Apr 10 14:34:47 zfs4 systemd[1]: Starting NVMf auto discovery service...
Apr 10 14:34:47 zfs4 kernel: nvme nvme11: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.7.0.84:4420
Apr 10 14:34:47 zfs4 kernel: nvme nvme11: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Apr 10 14:34:47 zfs4 kernel: nvme nvme11: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.7.0.176:4420
Apr 10 14:34:47 zfs4 kernel: nvme nvme11: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Apr 10 14:34:47 zfs4 kernel: nvme nvme11: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.7.0.177:4420
Apr 10 14:34:47 zfs4 kernel: nvme nvme11: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Apr 10 14:34:47 zfs4 systemd[1]: nvme_fabrics_persistent.service: Succeeded.
Apr 10 14:34:47 zfs4 systemd[1]: Started NVMf auto discovery service.
Apr 10 14:34:47 zfs4 systemd[1]: nvme_fabrics_persistent.service: Consumed 10ms CPU time
Apr 10 14:34:51 zfs4 zed[3158674]: eid=120554 class=delay pool='home3' vdev=wwn-0x5000cca25316e978 size=1048576 offset=10458551013376 priority=4 err=0 flags=0x400804b0 delay=44052ms
Apr 10 14:35:01 zfs4 rpc.mountd[15629]: authenticated unmount request from 10.14.0.39:951 for /home3/dan.kozak (/home3/dan.kozak)
Apr 10 14:35:07 zfs4 kernel: sd 0:0:33:0: attempting task abort!scmd(0x000000009ad99bda), outstanding for 30143 ms & timeout 30000 ms
Apr 10 14:35:07 zfs4 kernel: sd 0:0:33:0: [sdan] tag#7543 CDB: Read(16) 88 00 00 00 00 04 c1 89 d0 68 00 00 03 e8 00 00
Apr 10 14:35:07 zfs4 kernel: scsi target0:0:33: handle(0x002d), sas_address(0x5000cca25316e97a), phy(32)
Apr 10 14:35:07 zfs4 kernel: scsi target0:0:33: enclosure logical id(0x5000ccab04003d00), slot(41) 
Apr 10 14:35:07 zfs4 kernel: scsi target0:0:33: enclosure level(0x0000), connector name(     )
Apr 10 14:35:07 zfs4 kernel: sd 0:0:33:0: task abort: SUCCESS scmd(0x000000009ad99bda)
Apr 10 14:35:10 zfs4 kernel: sd 0:0:33:0: attempting task abort!scmd(0x00000000ff96ced7), outstanding for 30077 ms & timeout 30000 ms
Apr 10 14:35:10 zfs4 kernel: sd 0:0:33:0: [sdan] tag#1694 CDB: Read(16) 88 00 00 00 00 04 c1 89 d8 68 00 00 03 d8 00 00
Apr 10 14:35:10 zfs4 kernel: scsi target0:0:33: handle(0x002d), sas_address(0x5000cca25316e97a), phy(32)
Apr 10 14:35:10 zfs4 kernel: scsi target0:0:33: enclosure logical id(0x5000ccab04003d00), slot(41) 
Apr 10 14:35:10 zfs4 kernel: scsi target0:0:33: enclosure level(0x0000), connector name(     )
Apr 10 14:35:10 zfs4 kernel: sd 0:0:33:0: task abort: SUCCESS scmd(0x00000000ff96ced7)
Apr 10 14:35:26 zfs4 zed[3158897]: eid=120555 class=delay pool='home3' vdev=wwn-0x5000cca25316e978 size=1048576 offset=10458574135296 priority=4 err=0 flags=0x400804b0 delay=49039ms
Apr 10 14:35:33 zfs4 zed[3158955]: eid=120556 class=delay pool='home3' vdev=wwn-0x5000cca25316e978 size=1040384 offset=10458575183872 priority=4 err=0 flags=0x400804b0 delay=53832ms
Apr 10 14:35:59 zfs4 kernel: WARNING: MMP writes to pool 'home3' have not succeeded in over 133188 ms; suspending pool. Hrtime 258714375343622
Apr 10 14:35:59 zfs4 kernel: WARNING: Pool 'home3' has encountered an uncorrectable I/O failure and has been suspended.
Apr 10 14:35:59 zfs4 zed[3159122]: eid=120557 class=statechange pool='home3' vdev=wwn-0x5000cca25316e978 vdev_state=FAULTED
Apr 10 14:35:59 zfs4 zed[3159123]: eid=120558 class=io_failure pool='home3'
Apr 10 14:35:59 zfs4 zed[3159133]: vdev wwn-0x5000cca25316e978 set '/sys/class/enclosure/0:0:0:0/SLOT 41,8DGDLMBY            /fault' LED to 1
Apr 10 14:39:47   pool: home3
Apr 10 14:39:47  state: SUSPENDED
Apr 10 14:39:47 status: The pool is suspended because multihost writes failed or were delayed;
Apr 10 14:39:47     another system could import the pool undetected.
Apr 10 14:39:47 action: Make sure the pool's devices are connected, then reboot your system and
Apr 10 14:39:47     import the pool.
Apr 10 14:39:47    see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-MM
Apr 10 14:39:47   scan: scrub in progress since Tue Apr  9 10:21:39 2024
Apr 10 14:39:47     166T / 601T scanned at 1.67G/s, 160T / 601T issued at 1.61G/s
Apr 10 14:39:47     180K repaired, 26.62% done, 3 days 06:02:12 to go
Apr 10 14:39:47 config:
Apr 10 14:39:47 
Apr 10 14:39:47     NAME                                 STATE     READ WRITE CKSUM
Apr 10 14:39:47     home3                                DEGRADED     0     0     0
Apr 10 14:39:47       raidz3-0                           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2531dd934           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2531e2a94           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca29187c34c           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2b8482d4c           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca26fcec890           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2531e5f6c           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca27003bb00           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2531e84ec           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2531e868c           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2531e87d8           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2531e9750           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2b0d97f08           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2531e9f48           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2531eb96c           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2531ec858           ONLINE       0     0     0
Apr 10 14:39:47       raidz3-1                           DEGRADED     0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530aa110           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530aa424           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530aacb4           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530e297c           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530f661c           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca291abaed0           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca253123c08           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca253158878           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca253168af4           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2b0044388           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca25316d614           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca25316e978           FAULTED     41     0     0  too many errors
Apr 10 14:39:47         wwn-0x5000cca253178e50           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2531bc948           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2531d7500           ONLINE       0     0     0
Apr 10 14:39:47       raidz3-2                           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a4188           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a461c           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a4868           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a4918           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a4bc4           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a58a4           ONLINE       0     0     0
Apr 10 14:39:47         dm-uuid-mpath-35000cca2530a63f4  ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a6adc           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a6f3c           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a7108           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a7130           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a7288           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a7408           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a7428           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a7494           ONLINE       0     0     0
Apr 10 14:39:47       raidz3-3                           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca253032d58           ONLINE       0     0     0
Apr 10 14:39:47         scsi-35000cca253075af0           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca253084ee4           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2aa29bc90           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca25308a028           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca253090cbc           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca253091b9c           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530925bc           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca253093758           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca25309e6d8           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a0dd0           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a0f64           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a1140           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a21c4           ONLINE       0     0     0
Apr 10 14:39:47         wwn-0x5000cca2530a23b4           ONLINE       0     0     0
Apr 10 14:39:47     special 
Apr 10 14:39:47       mirror-4                           ONLINE       0     0     0
Apr 10 14:39:47         zfs-0ef5386f750f66d9             ONLINE       0     0     0
Apr 10 14:39:47         zfs-a2c8272a11a49ac3             ONLINE       0     0     0
Apr 10 14:39:47       mirror-6                           ONLINE       0     0     0
Apr 10 14:39:47         zfs-ef648f8d22311c93             ONLINE       0     0     0
Apr 10 14:39:47         zfs-9f637799f030adc6             ONLINE       0     0     0
Apr 10 14:39:47     logs    
Apr 10 14:39:47       system-slog                        ONLINE       0     0     0
Apr 10 14:39:47     cache
Apr 10 14:39:47       nvme4n1                            ONLINE       0     0     0
Apr 10 14:39:47       nvme5n1                            ONLINE       0     0     0
Apr 10 14:39:47 
Apr 10 14:39:47 errors: No known data errors
stuartthebruce commented 6 months ago

Note, rebooting the systems brought the pool back online without any problems. Though I am holding off on restarting zpool scrub to see if anyone has any ideas.

stuartthebruce commented 6 months ago

I also received an email notification of the pool going into the FAULTED state with the same timestamp (second resolution) as the syslog suspended message.

The number of I/O errors associated with a ZFS device exceeded
acceptable levels. ZFS has marked the device as faulted.

impact: Fault tolerance of the pool may be compromised.
   eid: 120557
 class: statechange
 state: FAULTED
  host: zfs4
  time: 2024-04-10 14:35:59-0700
 vpath: /dev/disk/by-id/wwn-0x5000cca25316e978
 vphys: /dev/disk/by-uuid/5574758552459228835
 vguid: 0x327D78DC0905E688
 devid: dm-uuid-mpath-35000cca25316e978
  pool: home3 (0xB112664CA12DAEBE)
            
rincebrain commented 6 months ago

15839 seems germane.

stuartthebruce commented 6 months ago

15839 seems germane.

Awesome, especially being able to zpool clear an MMP suspended pool that has not been imported by another system!