openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.64k stars 1.75k forks source link

ZED will not engage hot-spare and resilver upon drive state change to faulted #8967

Open AeonJJohnson opened 5 years ago

AeonJJohnson commented 5 years ago

System information

Type Version/Name
Distribution Name CentOS
Distribution Version 7.5
Linux Kernel 3.10.0-862.14.4.el7
Architecture x86_64
ZFS Version 0.7.9
SPL Version 0.7.9

ZFS config environment

Type Version/Name
Mulitpath Yes
Storage Architecture SAS
Import directory /dev/disk/by-vdev/
UDev config ZFS vdev_id

Describe the problem you're observing

ZED will not engage hot spare upon drive failure. Failing drive gets faulted, pool goes degraded, zpool history shows state change, ZED sees failure and sets enclosure fault LED but hot spare is not engaged and resilver to hot spare does not start.

Describe how to reproduce the problem

Build a pool. Wait for drive to fail blocks. Profit.

Include any warning/errors/backtraces from the system logs

ZED sees the fault state change on the failing drive and turns on its chassis LED

/var/log/messages:Jun 28 00:02:50 myhost zed: eid=164 class=io pool_guid=0x1E0D9667DBEC61A2 vdev_path=/dev/disk/by-vdev/A0
/var/log/messages:Jun 28 00:02:50 myhost zed: eid=165 class=io pool_guid=0x1E0D9667DBEC61A2 vdev_path=/dev/disk/by-vdev/A0
/var/log/messages:Jun 28 00:02:50 myhost zed: eid=166 class=io pool_guid=0x1E0D9667DBEC61A2 vdev_path=/dev/disk/by-vdev/A0
/var/log/messages:Jun 28 00:02:50 myhost zed: eid=167 class=probe_failure pool_guid=0x1E0D9667DBEC61A2 vdev_path=/dev/disk/by-vdev/A0
/var/log/messages:Jun 28 00:02:50 myhost zed: eid=168 class=statechange pool_guid=0x1E0D9667DBEC61A2 vdev_path=/dev/disk/by-vdev/A0 vdev_state=FAULTED
/var/log/messages:Jun 28 00:02:50 myhost zed: vdev A0 set '/sys/class/enclosure/0:0:123:0/SLOT 00,VAG3J8KL            /fault' LED to 1
/var/log/messages:Jun 28 00:02:50 myhost zed: eid=169 class=config_sync pool_guid=0x1E0D9667DBEC61A2

Pool events show state change:

Jun 28 2019 00:02:50.226417802 ereport.fs.zfs.io
Jun 28 2019 00:02:50.226417802 ereport.fs.zfs.io
Jun 28 2019 00:02:50.230417785 ereport.fs.zfs.io
Jun 28 2019 00:02:50.230417785 ereport.fs.zfs.probe_failure
Jun 28 2019 00:02:50.230417785 resource.fs.zfs.statechange
Jun 28 2019 00:02:50.674415948 sysevent.fs.zfs.config_sync

The pool has a dedicated hot spare that is available

  pool: mypool
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
  scan: scrub repaired 0B in 8h24m with 0 errors on Mon Jun 17 06:54:36 2019
config:

        NAME        STATE     READ WRITE CKSUM
        mypool    DEGRADED     0     0     0
          raidz2-0  DEGRADED     0     0     0
            A0      FAULTED     12     1     0  too many errors
            A1      ONLINE       0     0     0
            A2      ONLINE       0     0     0
            A3      ONLINE       0     0     0
            A4      ONLINE       0     0     0
            A5      ONLINE       0     0     0
            A6      ONLINE       0     0     0
            A7      ONLINE       0     0     0
            A8      ONLINE       0     0     0
            A9      ONLINE       0     0     0
            A10     ONLINE       0     0     0
            A11     ONLINE       0     0     0
            A12     ONLINE       0     0     0
            A13     ONLINE       0     0     0
            A14     ONLINE       0     0     0
        spares
          S1        AVAIL

errors: No known data errors

Running zed -F -v indicates that zed can properly see the current degraded state of the pool

Registered zedlet "statechange-led.sh"
Registered zedlet "statechange-notify.sh"
Registered zedlet "vdev_attach-led.sh"
Ignoring "zed.rc": not executable by user
Registered zedlet "pool_import-led.sh"
Registered zedlet "resilver_finish-notify.sh"
Registered zedlet "scrub_finish-notify.sh"
Registered zedlet "data-notify.sh"
Ignoring "zed-functions.sh": not executable by user
Registered zedlet "resilver_finish-start-scrub.sh"
Registered zedlet "vdev_clear-led.sh"
Registered zedlet "all-syslog.sh"
ZFS Event Daemon 0.7.9-1 (PID 159198)
Add Agent: init
Diagnosis Engine: register module
Retire Agent: register module
zed_disk_event_init
Waiting for new udev disk events...
Processing events since eid=169
zfs_unavail_pool: examining 'mypool' (state 6)
zfs_unavail_pool: examining 'yourpool' (state 7)
^CExiting
zed_disk_event_fini
zfs_agent_consumer_thread: exiting
Retire Agent: fmd.accepted: 0
Retire Agent: unregister module
Diagnosis Engine: fmd.accepted: 0
Diagnosis Engine: fmd.caseopen: 0
Diagnosis Engine: fmd.casesolved: 0
Diagnosis Engine: fmd.caseclosed: 0
Diagnosis Engine: old_drops: 0
Diagnosis Engine: dev_drops: 0
Diagnosis Engine: vdev_drops: 0
Diagnosis Engine: import_drops: 0
Diagnosis Engine: resource_drops: 0
Diagnosis Engine: unregister module
Add Agent: fini

I am able to manually resilver to the hot-spare by running: zpool replace mypool A0 S1

I have seen this multiple times and am unable to track down why ZED won't engage the hot-spare S1 and start the resilver. There are no logs indicating any errors or failures encountered in an attempt to engage the hot-spare drive.

Autoreplace is set to off, which from everything I know and have read is not a requirement for zed to engage and resilver to a hot-spare upon drive fault.

NAME      PROPERTY                       VALUE                          SOURCE
mypool  size                           109T                           -
mypool  capacity                       26%                            -
mypool  altroot                        -                              default
mypool  health                         DEGRADED                       -
mypool  guid                           2165552368650576290            -
mypool  version                        -                              default
mypool  bootfs                         -                              default
mypool  delegation                     on                             default
mypool  autoreplace                    off                            default
mypool  cachefile                      none                           local
mypool  failmode                       wait                           default
mypool  listsnapshots                  off                            default
mypool  autoexpand                     off                            default
mypool  dedupditto                     0                              default
mypool  dedupratio                     1.00x                          -
mypool  free                           80.2T                          -
mypool  allocated                      28.8T                          -
mypool  readonly                       off                            -
mypool  ashift                         12                             local
mypool  comment                        -                              default
mypool  expandsize                     -                              -
mypool  freeing                        0                              -
mypool  fragmentation                  1%                             -
mypool  leaked                         0                              -
mypool  multihost                      on                             local
mypool  feature@async_destroy          enabled                        local
mypool  feature@empty_bpobj            active                         local
mypool  feature@lz4_compress           active                         local
mypool  feature@multi_vdev_crash_dump  enabled                        local
mypool  feature@spacemap_histogram     active                         local
mypool  feature@enabled_txg            active                         local
mypool  feature@hole_birth             active                         local
mypool  feature@extensible_dataset     active                         local
mypool  feature@embedded_data          active                         local
mypool  feature@bookmarks              enabled                        local
mypool  feature@filesystem_limits      enabled                        local
mypool  feature@large_blocks           active                         local
mypool  feature@large_dnode            active                         local
mypool  feature@sha512                 enabled                        local
mypool  feature@skein                  enabled                        local
mypool  feature@edonr                  enabled                        local
mypool  feature@userobj_accounting     active                         local

I have seen reference to needing to specifically set hot-spare action in zed.rc but the 2014 references to this method do not appear in the 0.7.9 code tree so that appears to be a red herring:

ZED_SPARE_ON_IO_ERRORS=1
AeonJJohnson commented 5 years ago

I've looked at zfs/cmd/zed/agents/zfs_retire.c and I cannot identify why the resilver to hot-spare doesn't execute.

Detailed zpool events: https://pastebin.com/13r4UDL6

tonyhutter commented 5 years ago

I tried to reproduce this in master and got mixed results. When I force faulted a drive with zpool offline -f, the spare did not kick in:

  pool: tank
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
    repaired.
  scan: none requested
config:

    NAME                        STATE     READ WRITE CKSUM
    tank                        DEGRADED     0     0     0
      mirror-0                  DEGRADED     0     0     0
        /home/hutter/zfs/file1  ONLINE       0     0     0
        /home/hutter/zfs/file2  FAULTED      0     0     0  external device fault
    spares
      /home/hutter/zfs/file3    AVAIL

However, I did see the spare kick in on the auto_spare_001_pos test case, and when I manually tested with zinject errors into the vdev to fault it:

  pool: tank
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
    repaired.
  scan: resilvered 290K in 0 days 00:00:00 with 0 errors on Mon Jul  1 15:40:17 2019
config:

    NAME                          STATE     READ WRITE CKSUM
    tank                          DEGRADED     0     0     0
      mirror-0                    DEGRADED     0     0     0
        spare-0                   DEGRADED     0     0     0
          /home/hutter/zfs/file1  FAULTED      0 1.28K    48  too many errors
          /home/hutter/zfs/file3  ONLINE       0     0     0
        /home/hutter/zfs/file2    ONLINE       0     0     0
    spares
      /home/hutter/zfs/file3      INUSE     currently in use

Jul 01 15:40:36 zed[5897]: eid=1347 class=io pool_guid=0x8EEB0F154CDC4AF8 vdev_path=/home/hutter/zfs/file1
Jul 01 15:40:37 zed[5910]: eid=1348 class=io pool_guid=0x8EEB0F154CDC4AF8 vdev_path=/home/hutter/zfs/file1
Jul 01 15:40:37 zed[5923]: eid=1349 class=io pool_guid=0x8EEB0F154CDC4AF8 vdev_path=/home/hutter/zfs/file1
Jul 01 15:40:37 zed[5936]: eid=1350 class=io pool_guid=0x8EEB0F154CDC4AF8 vdev_path=/home/hutter/zfs/file1
Jul 01 15:40:38 zed[5949]: eid=1351 class=io pool_guid=0x8EEB0F154CDC4AF8 vdev_path=/home/hutter/zfs/file1
Jul 01 15:40:38 zed[5962]: eid=1352 class=io pool_guid=0x8EEB0F154CDC4AF8 vdev_path=/home/hutter/zfs/file1
Jul 01 15:40:38 zed[5975]: eid=1353 class=io pool_guid=0x8EEB0F154CDC4AF8 vdev_path=/home/hutter/zfs/file1
Jul 01 15:40:39 zed[5988]: eid=1354 class=statechange pool_guid=0x8EEB0F154CDC4AF8 vdev_path=/home/hutter/zfs/file1 vdev_state=FAULTED
Jul 01 15:40:40 zed[6007]: eid=1355 class=scrub_start pool_guid=0x8EEB0F154CDC4AF8
Jul 01 15:40:40 zed[6020]: eid=1356 class=history_event pool_guid=0x8EEB0F154CDC4AF8
Jul 01 15:40:40 zed[6034]: eid=1357 class=history_event pool_guid=0x8EEB0F154CDC4AF8
Jul 01 15:40:41 zed[6048]: eid=1358 class=scrub_start pool_guid=0x8EEB0F154CDC4AF8
Jul 01 15:40:41 zed[6061]: eid=1359 class=history_event pool_guid=0x8EEB0F154CDC4AF8
Jul 01 15:40:42 zed[6075]: eid=1360 class=config_sync pool_guid=0x8EEB0F154CDC4AF8
Jul 01 15:40:43 zed[6088]: eid=1361 class=vdev_spare pool_guid=0x8EEB0F154CDC4AF8 vdev_path=/home/hutter/zfs/file3 vdev_state=ONLINE
Jul 01 15:40:43 zed[6101]: eid=1362 class=vdev_attach pool_guid=0x8EEB0F154CDC4AF8 vdev_path=/home/hutter/zfs/file3 vdev_state=ONLINE
AeonJJohnson commented 5 years ago

Would the zpool offline -f generate the event message that zed agents are looking for? In cmd/zed/agents/zfs_diagnosis.c it appears to be looking for io, checksum or probe failures.

I was hoping there was a way, perhaps with zdb to echoing a string to a module parameter handle in sysfs that a message could be injected as a zfs event to diagnose what or how zed is engaging.

richardelling commented 5 years ago

see zinject

tonyhutter commented 5 years ago

So I just tried zpool offline -f again, and it did activate the spare. Then I rebooted my VM, tried it again, and it didn't activate the spare. I'll need to do some more digging...

tonyhutter commented 5 years ago

I suspect what we really need is a "test spare activation after a drive has disappeared" test case. The current test cases just test faults via injecting IO/checksum errors. @AeonJJohnson mentioned that this is from a multipathed SAS array, which, if it's like the arrays I've seen, often have bad drives disappear and reappear. That may help us reproduce what @AeonJJohnson is seeing.

@AeonJJohnson did you happen to see any SCSI/multipath errors in dmesg at the time of the disk failure?

AeonJJohnson commented 5 years ago

There were SCSI errors (hardware/media on writes)

Jun 28 00:02:17 myhost kernel: sd 0:0:124:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 28 00:02:17 myhost kernel: sd 0:0:124:0: [sda] tag#0 Sense Key : Hardware Error [current] [descriptor]
Jun 28 00:02:17 myhost kernel: sd 0:0:124:0: [sda] tag#0 ASC=0x44 <<vendor>>ASCQ=0xf9
Jun 28 00:02:17 myhost kernel: sd 0:0:124:0: [sda] tag#0 CDB: Write(10) 2a 00 00 00 00 3f 00 00 01 00
Jun 28 00:02:17 myhost kernel: sd 0:0:124:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 28 00:02:17 myhost kernel: sd 0:0:124:0: [sda] tag#0 Sense Key : Hardware Error [current] [descriptor]
Jun 28 00:02:17 myhost kernel: sd 0:0:124:0: [sda] tag#0 ASC=0x44 <<vendor>>ASCQ=0xf9
Jun 28 00:02:17 myhost kernel: sd 0:0:124:0: [sda] tag#0 CDB: Read(10) 28 00 74 70 24 c2 00 00 02 00
Jun 28 00:02:19 myhost kernel: sd 0:0:124:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 28 00:02:19 myhost kernel: sd 0:0:124:0: [sda] tag#0 Sense Key : Hardware Error [current] [descriptor]
Jun 28 00:02:19 myhost kernel: sd 0:0:124:0: [sda] tag#0 ASC=0x44 <<vendor>>ASCQ=0xf9
Jun 28 00:02:19 myhost kernel: sd 0:0:124:0: [sda] tag#0 CDB: Read(10) 28 00 00 00 00 42 00 00 02 00
Jun 28 00:02:19 myhost kernel: sd 0:0:124:0: [sda] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 28 00:02:19 myhost kernel: sd 0:0:124:0: [sda] tag#1 Sense Key : Hardware Error [current] [descriptor]
Jun 28 00:02:19 myhost kernel: sd 0:0:124:0: [sda] tag#1 ASC=0x44 <<vendor>>ASCQ=0xf9
Jun 28 00:02:19 myhost kernel: sd 0:0:124:0: [sda] tag#1 CDB: Read(10) 28 00 74 70 25 02 00 00 02 00
...
Jun 28 00:02:17 myhost kernel: sd 11:0:123:0: [sddd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 28 00:02:17 myhost kernel: sd 11:0:123:0: [sddd] tag#0 Sense Key : Hardware Error [current] [descriptor]
Jun 28 00:02:17 myhost kernel: sd 11:0:123:0: [sddd] tag#0 ASC=0x44 <<vendor>>ASCQ=0xf9
Jun 28 00:02:17 myhost kernel: sd 11:0:123:0: [sddd] tag#0 CDB: Read(10) 28 00 00 00 00 42 00 00 02 00
Jun 28 00:02:17 myhost kernel: sd 11:0:123:0: [sddd] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 28 00:02:17 myhost kernel: sd 11:0:123:0: [sddd] tag#1 Sense Key : Hardware Error [current] [descriptor]
Jun 28 00:02:17 myhost kernel: sd 11:0:123:0: [sddd] tag#1 ASC=0x44 <<vendor>>ASCQ=0xf9
Jun 28 00:02:17 myhost kernel: sd 11:0:123:0: [sddd] tag#1 CDB: Read(10) 28 00 74 70 25 02 00 00 02 00
Jun 28 00:02:19 myhost kernel: sd 11:0:123:0: [sddd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 28 00:02:19 myhost kernel: sd 11:0:123:0: [sddd] tag#0 Sense Key : Hardware Error [current] [descriptor]
Jun 28 00:02:19 myhost kernel: sd 11:0:123:0: [sddd] tag#0 ASC=0x44 <<vendor>>ASCQ=0xf9
Jun 28 00:02:19 myhost kernel: sd 11:0:123:0: [sddd] tag#0 CDB: Read(10) 28 00 74 70 24 c2 00 00 02 00

And some multipath TUR checker errors as the drive started to take a dirt nap. Failures are rarely graceful.

Jun 28 00:02:40 myhost multipathd: 35000cca0990668d8: sda - tur checker reports path is down
Jun 28 00:02:40 myhost multipathd: checker failed path 8:0 in map 35000cca0990668d8
Jun 28 00:02:40 myhost kernel: device-mapper: multipath: Failing path 8:0.
Jun 28 00:02:40 myhost multipathd: 35000cca0990668d8: remaining active paths: 1
Jun 28 00:02:49 myhost multipathd: 35000cca0990668d8: sddd - tur checker reports path is down
Jun 28 00:02:49 myhost kernel: device-mapper: multipath: Failing path 70:176.
Jun 28 00:02:49 myhost multipathd: checker failed path 70:176 in map 35000cca0990668d8
Jun 28 00:02:49 myhost multipathd: 35000cca0990668d8: remaining active paths: 0

Then correlate the above time stamps with the zpool events time stamps:

Jun 28 2019 00:02:17.499553228 ereport.fs.zfs.io
Jun 28 2019 00:02:17.755552169 ereport.fs.zfs.io
Jun 28 2019 00:02:17.766552123 ereport.fs.zfs.io
Jun 28 2019 00:02:17.766552123 ereport.fs.zfs.probe_failure
Jun 28 2019 00:02:19.102546594 ereport.fs.zfs.io
Jun 28 2019 00:02:19.103546590 ereport.fs.zfs.io
Jun 28 2019 00:02:19.107546574 ereport.fs.zfs.io
Jun 28 2019 00:02:19.107546574 ereport.fs.zfs.probe_failure
Jun 28 2019 00:02:19.107546574 resource.fs.zfs.statechange
Jun 28 2019 00:02:40.751457010 resource.fs.zfs.statechange
Jun 28 2019 00:02:40.758456981 ereport.fs.zfs.io
Jun 28 2019 00:02:40.760456973 ereport.fs.zfs.io
Jun 28 2019 00:02:40.761456969 ereport.fs.zfs.io
Jun 28 2019 00:02:40.761456969 ereport.fs.zfs.probe_failure
Jun 28 2019 00:02:40.761456969 resource.fs.zfs.statechange
Jun 28 2019 00:02:41.737452930 sysevent.fs.zfs.config_sync
Jun 28 2019 00:02:41.738452926 resource.fs.zfs.statechange
Jun 28 2019 00:02:41.739452921 ereport.fs.zfs.io
Jun 28 2019 00:02:41.740452917 ereport.fs.zfs.io
Jun 28 2019 00:02:41.741452913 ereport.fs.zfs.io
Jun 28 2019 00:02:41.741452913 ereport.fs.zfs.probe_failure
Jun 28 2019 00:02:41.741452913 resource.fs.zfs.statechange
Jun 28 2019 00:02:42.204450997 sysevent.fs.zfs.config_sync
Jun 28 2019 00:02:49.714419921 resource.fs.zfs.statechange
Jun 28 2019 00:02:49.716419912 ereport.fs.zfs.io
Jun 28 2019 00:02:49.717419908 ereport.fs.zfs.io
Jun 28 2019 00:02:49.719419900 ereport.fs.zfs.io
Jun 28 2019 00:02:49.719419900 ereport.fs.zfs.probe_failure
Jun 28 2019 00:02:49.719419900 resource.fs.zfs.statechange
Jun 28 2019 00:02:50.212417860 sysevent.fs.zfs.config_sync
Jun 28 2019 00:02:50.226417802 resource.fs.zfs.statechange
Jun 28 2019 00:02:50.226417802 ereport.fs.zfs.io
Jun 28 2019 00:02:50.226417802 ereport.fs.zfs.io
Jun 28 2019 00:02:50.230417785 ereport.fs.zfs.io
Jun 28 2019 00:02:50.230417785 ereport.fs.zfs.probe_failure
Jun 28 2019 00:02:50.230417785 resource.fs.zfs.statechange
Jun 28 2019 00:02:50.674415948 sysevent.fs.zfs.config_sync
AeonJJohnson commented 5 years ago

Here is a pastebin of the zpool events -v

I would think that ZED would've seen what it needed and engaged the S1 hot-spare after on the statechange at 00:02:19.107546574.

Jun 28 2019 00:02:19.107546574 ereport.fs.zfs.probe_failure
        class = "ereport.fs.zfs.probe_failure"
        ena = 0x68ef16d804700c01
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0x1e0d9667dbec61a2
                vdev = 0x11475c335622fbc9
        (end detector)
        pool = "mypool"
        pool_guid = 0x1e0d9667dbec61a2
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "wait"
        vdev_guid = 0x11475c335622fbc9
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-vdev/A0"
        vdev_devid = "dm-uuid-mpath-35000cca0990668d8"
        vdev_enc_sysfs_path = "/sys/class/enclosure/0:0:123:0/SLOT 00,VAG3J8KL            "
        vdev_ashift = 0xc
        vdev_complete_ts = 0x6d68ef16cd896
        vdev_delta_ts = 0x5babb3
        vdev_read_errors = 0x6
        vdev_write_errors = 0x1
        vdev_cksum_errors = 0x0
        parent_guid = 0x21e530e68cb794cf
        parent_type = "raidz"
        vdev_spare_paths = "/dev/disk/by-vdev/S1"
        vdev_spare_guids = 0x8637292f9b87e99e
        prev_state = 0x0
        time = 0x5d15914b 0x66907ce
        eid = 0x8c

Jun 28 2019 00:02:19.107546574 resource.fs.zfs.statechange
        version = 0x0
        class = "resource.fs.zfs.statechange"
        pool = "mypool"
        pool_guid = 0x1e0d9667dbec61a2
        pool_state = 0x0
        pool_context = 0x0
        vdev_guid = 0x11475c335622fbc9
        vdev_state = "FAULTED" (0x5)
        vdev_path = "/dev/disk/by-vdev/A0"
        vdev_devid = "dm-uuid-mpath-35000cca0990668d8"
        vdev_physpath = "A0"
        vdev_enc_sysfs_path = "/sys/class/enclosure/0:0:123:0/SLOT 00,VAG3J8KL            "
        vdev_laststate = "ONLINE" (0x7)
        time = 0x5d15914b 0x66907ce
        eid = 0x8d

ZED saw something it liked as it turned on the drive slot fault LED at 00:02.19

Jun 28 00:02:17 myhost zed: eid=133 class=io pool_guid=0x1E0D9667DBEC61A2 vdev_path=/dev/disk/by-vdev/A0
Jun 28 00:02:17 myhost zed: eid=134 class=io pool_guid=0x1E0D9667DBEC61A2 vdev_path=/dev/disk/by-vdev/A0
Jun 28 00:02:17 myhost zed: eid=135 class=io pool_guid=0x1E0D9667DBEC61A2 vdev_path=/dev/disk/by-vdev/A0
Jun 28 00:02:17 myhost zed: eid=136 class=probe_failure pool_guid=0x1E0D9667DBEC61A2 vdev_path=/dev/disk/by-vdev/A0
Jun 28 00:02:19 myhost zed: eid=137 class=io pool_guid=0x1E0D9667DBEC61A2 vdev_path=/dev/disk/by-vdev/A0
Jun 28 00:02:19 myhost zed: eid=138 class=io pool_guid=0x1E0D9667DBEC61A2 vdev_path=/dev/disk/by-vdev/A0
Jun 28 00:02:19 myhost zed: eid=139 class=io pool_guid=0x1E0D9667DBEC61A2 vdev_path=/dev/disk/by-vdev/A0
Jun 28 00:02:19 myhost zed: eid=140 class=probe_failure pool_guid=0x1E0D9667DBEC61A2 vdev_path=/dev/disk/by-vdev/A0
Jun 28 00:02:19 myhost zed: eid=141 class=statechange pool_guid=0x1E0D9667DBEC61A2 vdev_path=/dev/disk/by-vdev/A0 vdev_state=FAULTED
Jun 28 00:02:19 myhost zed: vdev A0 set '/sys/class/enclosure/0:0:123:0/SLOT 00,VAG3J8KL            /fault' LED to 1
gdevenyi commented 4 years ago

I've also recently experienced failure to activate a spare on 0.7.13 under similar conditions (multipathd handling the drives). I can't offer a zevents because a recent zfs recv spilled over the events buffer.

stale[bot] commented 3 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

AeonJJohnson commented 3 years ago

poke

This still appears to be a defect, others reporting mixed results. We are seeing it at customer sites as well currently.

Any thoughts on what/where to dig on this?

behlendorf commented 3 years ago

@AeonJJohnson we have made a couple fixes in this area (4b5c9d9f976375ca4f075a533d2b6f0c9a14cf4a 0aacde2e9a818942f4e25ab765b3a4849aac6de8 32366649d3164e994def7e5a26765ca6bc1e2315) over the last year or so, if you can reproduce this on demand I think it'd be worth trying the 2.1-rc6 release. In particular, I think 0aacde2e9a818942f4e25ab765b3a4849aac6de8 could sort this out since it updates the kernel modules to explicitly issue a removal event when they determine the vdev is gone. The ZED should already have been getting a udev notification for this, but if for some reason it got lost the spare might not get kicked in (and there's no harm is sending it twice).

The statechange event itself isn't enough to cause a hot spare to be kicked in. If you're willing to dig in to the code I'd suggest starting with the zfs_retire_recv() function, this is what should be called when the ZED determines a device should get replaced by a hot spare.

jschwartz-cray commented 3 years ago

Another potentially related change is https://github.com/openzfs/zfs/pull/11355.

stale[bot] commented 2 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.