openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.55k stars 1.74k forks source link

statechange event not always generated when VDEV state changes #9437

Open cvoltz opened 5 years ago

cvoltz commented 5 years ago

System information

Type Version/Name
Distribution Name CentOS
Distribution Version 7.6
Linux Kernel 3.10.0-957.10.1
Architecture x86_64
ZFS Version 0.7.13-1
SPL Version 0.7.13-1

Problem

When a drive in a pool is FAULTED (e.g., due to I/O errors) or the drive goes OFFLINE (e.g. the zpool offline command was run), the resource.fs.zfs.statechange event is generated with the vdev_state set appropriately. If the drive is brought online (e.g., the zpool online command was run), the resource.fs.zfs.statechange event is generated with the vdev_state set to ONLINE. However, if the drive is replaced using the zpool replace command, the resource.fs.zfs.statechange event is not generated.

Lustre 2.11 added the ZEDLET statechange-lustre.sh which changes the obdfilter.*.degraded property for a target when the pool's state changes. It sets the degraded property if the pool is DEGRADED and resets the property if the pool is ONLINE. Since ZFS is not always generating the state change event, sometimes the target's degraded property is left set even when the pool is ONLINE, which reduces performance of the Lustre filesystem.

See https://jira.whamcloud.com/browse/LU-12836 for more information (including output from zpool events -v).

Steps to reproduce

  1. Create a pool:
    pool=ost04
    zpool create $pool \
     -o ashift=12 \
     -o cachefile=none \
     -O canmount=off \
     -O recordsize=1024K \
     -f \
     raidz2 /dev/mapper/d8000_sep500C0FF03C1AC73E_bay0{41..50}-0
  2. Select a drive to fail from the pool:
    bad_drive=d8000_sep500C0FF03C1AC73E_bay050-0
  3. Select an unused drive to use as the replacement drive:
    spare_drive=d8000_sep500C0FF03C1AC73E_bay101-0
  4. Verify the pool is ONLINE:
    zpool list -H -o name,health $pool
  5. Wipe the replacement drive so it looks like an unused drive:
    wipefs --all --force /dev/mapper/$spare_drive
  6. Clear the event history (just to make it easier to see what events were generated by the test):
    zpool events -c
  7. Simulate a drive failure for the selected drive by taking it offline:
    zpool offline $pool $bad_drive
  8. Wait for the pool to become DEGRADED:
    ruby -r timeout <<'EOF'
    Timeout::timeout(45) do
     loop do
       print '.'
       break if `zpool status` =~ /DEGRADED/
     end
    end
    EOF
  9. Replace the "failed" drive:
    zpool replace $pool $bad_drive $spare_drive
  10. Wait for resilvering to finish:
    ruby -r timeout <<'EOF'
    Timeout::timeout(45) do
     loop do
       print '.'
       break if `zpool events` =~ /sysevent.fs.zfs.resilver_finish/
     end
    end
    EOF
  11. Verify the pool is ONLINE:
    zpool list -H -o name,health $pool
  12. Dump the event history:
    zpool events -v

    and notice it only has the state change event for the pool going OFFLINE instead of also having a state change event for the pool going ONLINE. The output should have included an event like this:

    Oct  8 2019 09:29:58.726502922 resource.fs.zfs.statechange
            version = 0x0
            class = "resource.fs.zfs.statechange"
            pool = "ost04"
            pool_guid = 0x8159dca79b3945a4
            pool_state = 0x0
            pool_context = 0x0
            vdev_guid = 0x4b6ac5c4c8d5cb1a
            vdev_state = "ONLINE" (0x7)
            vdev_path = "/dev/mapper/d8000_sep500C0FF03C1AC73E_bay101-0"
            vdev_devid = "dm-uuid-mpath-35000c500a63e36f7"
            vdev_laststate = "OFFLINE" (0x2)
            time = 0x5d9c9d66 0x2b4d8e0a
            eid = 0x7c

    Changing the zpool replace $pool $bad_drive $spare_drive command to zpool online $pool $bad_drive will result in the resource.fs.zfs.statechange event being generated when the pool goes ONLINE.

The Lustre issue includes the test-degraded-drive script which can be used for testing.

While we are looking at this specific scenario, we should investigate whether there are any other scenarios where the pool could change to ONLINE but not generate a corresponding state change event.

ofaaland commented 5 years ago

@tonyhutter can you take a look at this? Thanks

cvoltz commented 5 years ago

I have a fix for this. I'll generate a PR for it as soon as I am finished running the ZFS test suite on it.

stale[bot] commented 4 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

behlendorf commented 3 years ago

I'm reopening this since this hasn't yet been addressed to my knowledge.

stale[bot] commented 2 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.