statechange event not always generated when VDEV state changes

cvoltz commented 5 years ago

System information

Type	Version/Name
Distribution Name	CentOS
Distribution Version	7.6
Linux Kernel	3.10.0-957.10.1
Architecture	x86_64
ZFS Version	0.7.13-1
SPL Version	0.7.13-1

Problem

When a drive in a pool is FAULTED (e.g., due to I/O errors) or the drive goes OFFLINE (e.g. the zpool offline command was run), the resource.fs.zfs.statechange event is generated with the vdev_state set appropriately. If the drive is brought online (e.g., the zpool online command was run), the resource.fs.zfs.statechange event is generated with the vdev_state set to ONLINE. However, if the drive is replaced using the zpool replace command, the resource.fs.zfs.statechange event is not generated.

Lustre 2.11 added the ZEDLET statechange-lustre.sh which changes the obdfilter.*.degraded property for a target when the pool's state changes. It sets the degraded property if the pool is DEGRADED and resets the property if the pool is ONLINE. Since ZFS is not always generating the state change event, sometimes the target's degraded property is left set even when the pool is ONLINE, which reduces performance of the Lustre filesystem.

See https://jira.whamcloud.com/browse/LU-12836 for more information (including output from zpool events -v).

Steps to reproduce

Create a pool:

pool=ost04
zpool create $pool \
 -o ashift=12 \
 -o cachefile=none \
 -O canmount=off \
 -O recordsize=1024K \
 -f \
 raidz2 /dev/mapper/d8000_sep500C0FF03C1AC73E_bay0{41..50}-0

Select a drive to fail from the pool:

bad_drive=d8000_sep500C0FF03C1AC73E_bay050-0

Select an unused drive to use as the replacement drive:
```
spare_drive=d8000_sep500C0FF03C1AC73E_bay101-0
```
Verify the pool is ONLINE:
```
zpool list -H -o name,health $pool
```
Wipe the replacement drive so it looks like an unused drive:
```
wipefs --all --force /dev/mapper/$spare_drive
```
Clear the event history (just to make it easier to see what events were generated by the test):
```
zpool events -c
```
Simulate a drive failure for the selected drive by taking it offline:
```
zpool offline $pool $bad_drive
```

Wait for the pool to become DEGRADED:

ruby -r timeout <<'EOF'
Timeout::timeout(45) do
 loop do
   print '.'
   break if `zpool status` =~ /DEGRADED/
 end
end
EOF

Replace the "failed" drive:

zpool replace $pool $bad_drive $spare_drive

Wait for resilvering to finish:

ruby -r timeout <<'EOF'
Timeout::timeout(45) do
 loop do
   print '.'
   break if `zpool events` =~ /sysevent.fs.zfs.resilver_finish/
 end
end
EOF

Verify the pool is ONLINE:
```
zpool list -H -o name,health $pool
```

Dump the event history:

zpool events -v

and notice it only has the state change event for the pool going OFFLINE instead of also having a state change event for the pool going ONLINE. The output should have included an event like this:

Oct  8 2019 09:29:58.726502922 resource.fs.zfs.statechange
        version = 0x0
        class = "resource.fs.zfs.statechange"
        pool = "ost04"
        pool_guid = 0x8159dca79b3945a4
        pool_state = 0x0
        pool_context = 0x0
        vdev_guid = 0x4b6ac5c4c8d5cb1a
        vdev_state = "ONLINE" (0x7)
        vdev_path = "/dev/mapper/d8000_sep500C0FF03C1AC73E_bay101-0"
        vdev_devid = "dm-uuid-mpath-35000c500a63e36f7"
        vdev_laststate = "OFFLINE" (0x2)
        time = 0x5d9c9d66 0x2b4d8e0a
        eid = 0x7c

Changing the zpool replace $pool $bad_drive $spare_drive command to zpool online $pool $bad_drive will result in the resource.fs.zfs.statechange event being generated when the pool goes ONLINE.

The Lustre issue includes the test-degraded-drive script which can be used for testing.

While we are looking at this specific scenario, we should investigate whether there are any other scenarios where the pool could change to ONLINE but not generate a corresponding state change event.

ofaaland commented 5 years ago

@tonyhutter can you take a look at this? Thanks

cvoltz commented 5 years ago

I have a fix for this. I'll generate a PR for it as soon as I am finished running the ZFS test suite on it.

stale[bot] commented 4 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

behlendorf commented 3 years ago

I'm reopening this since this hasn't yet been addressed to my knowledge.

stale[bot] commented 2 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

openzfs / zfs