Closed WRMSRwasTaken closed 1 year ago
Yeah, those have been broken to various degrees for a while, AFAIK. I worked around one failure in ecaccf764a8ccbece8bb411f56b81ef5858fe710 a bit ago when someone reported them broken, but haven't had any personal need to go figure out why they're broken in general.
My guess is that zfs isn't detecting your enclosure sysfs path, possibly since you're using partitions. Can you post the output for one of your disks with zdb -C
? If zfs was able to successfully find the path to your enclosure, you should see the vdev_enc_sysfs_path
:
$ sudo zdb -C
...
children[0]:
type: 'disk'
id: 0
guid: 6964075262215447939
path: '/dev/disk/by-vdev/U0'
vdev_enc_sysfs_path: '/sys/class/enclosure/0:0:122:0/0'
whole_disk: 0
DTL: 1386
create_txg: 4
com.delphix:vdev_zap_leaf: 130
# zdb -C | grep enc
#
Yeah, it really seems due to me using partitions instead of the whole disk, that ZFS isn't detecting my enclosure path. Interestingly, VDEV_UPATH
seems to be set correctly in scripts.
Is there anything I can do on my side for ZFS to still detect the enclosure?
Is there anything I can do on my side for ZFS to still detect the enclosure?
I don't think so. However, I don't think it would be that hard to modify zfs_get_enclosure_sysfs_path()
to make it work.
Interesting, I did a kernel upgrade from 6.0.7-arch1-1
to 6.0.10-arch2-1
yesterday, and thing started to work:
# zdb -C | grep enc
vdev_enc_sysfs_path: '/sys/class/enclosure/17:0:12:0/Slot00'
vdev_enc_sysfs_path: '/sys/class/enclosure/17:0:12:0/Slot01'
vdev_enc_sysfs_path: '/sys/class/enclosure/17:0:12:0/Slot02'
vdev_enc_sysfs_path: '/sys/class/enclosure/17:0:12:0/Slot03'
vdev_enc_sysfs_path: '/sys/class/enclosure/17:0:12:0/Slot05'
vdev_enc_sysfs_path: '/sys/class/enclosure/17:0:12:0/Slot04'
vdev_enc_sysfs_path: '/sys/class/enclosure/17:0:12:0/Slot10'
vdev_enc_sysfs_path: '/sys/class/enclosure/17:0:12:0/Slot09'
vdev_enc_sysfs_path: '/sys/class/enclosure/17:0:12:0/Slot06'
vdev_enc_sysfs_path: '/sys/class/enclosure/17:0:12:0/Slot07'
vdev_enc_sysfs_path: '/sys/class/enclosure/17:0:12:0/Slot08'
vdev_enc_sysfs_path: '/sys/class/enclosure/17:0:12:0/Slot11'
pool: hdd2
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: resilvered 4.30G in 00:38:03 with 0 errors on Fri Dec 2 00:04:04 2022
config:
NAME STATE READ WRITE CKSUM upath fault_led
hdd2 DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
sdm1 ONLINE 0 0 0 /dev/sdm 0
sdl1 DEGRADED 0 0 0 too many errors /dev/sdl 2
sdi1 ONLINE 0 0 0 /dev/sdi 0
sdj1 DEGRADED 3.19K 11 3 too many errors /dev/sdj 2
sdk1 ONLINE 0 0 0 /dev/sdk 0
sdn1 ONLINE 0 0 0 /dev/sdn 0
errors: No known data errors
And indeed, the two fault LEDs were active on those two drive bays.
Guess I can close this?
System information
Describe the problem you're observing
I've had a disk faulted by ZFS today due to excessive read errors and wanted to replace it. Having set
ZED_USE_ENCLOSURE_LEDS=1
in/etc/zfs/zed.d/zed.rc
, the fault LED wasn't lit on any of those slots.After digging a bit deeper, I found out that none of those scripts in
/etc/zfs/zpool.d
seem to work, for examplelocate_led
/zpool status -L -c locate_led
or evenslot
/zpool status -L -c slot
. Looking at those scripts, they all seem to rely on$VDEV_ENC_SYSFS_PATH
env variable being set, but it's not in my case, it's empty.There might be a path issue on my system, looking at, for example,
sdd
, the path to the enclosure device is/sys/class/block/sdd/device/enclosure_device:Slot01
, with the realpath being/sys/devices/pci0000:20/0000:20:01.1/0000:21:00.0/host17/port-17:0/expander-17:0/port-17:0:8/end_device-17:0:8/target17:0:8/17:0:8:0/enclosure/17:0:8:0/Slot01
. This folder has thelocate
andfault
files.My disks are connected to a SuperMicro
BPN-SAS3-826EL1
backplane over a LSI9405W-16i
HBA. The backplane runs firmware66.16.11.0
and the HBA runs firmware23.00.00.00
. Thesg
driver is loaded, and also the backplane is being recognized, as shown bylscsi -g
:Unfortunately, the only thing I could find of to look at is
statechange-led.sh
in/etc/zfs/zed.d
, which hasvdev_enc_sysfs_path=$(realpath "/sys/class/block/$dev/device/enclosure_device"*)
. Puttingrealpath /sys/class/block/sdd/device/enclosure_device*
in the shell gives me the long path above. Runningzpool status -c upath,fault_led
, setsupath
correctly, butfault_led
is printing-
, as all other scripts relying on that env variable.If you can give me some hints on how to troubleshoot this, that'd be very much appreciated, as I'd really like to get those scripts to work on my system.
€: I am using partitions instead of giving ZFS the whole disk (but the partition is stretching over the whole disk), and set up ZFS to use partition UUIDs (
ls -l /dev/disk/by-partuuid
) instead of using the disk ID (ls -l /dev/disk/by-id/
), but this shouldn't be an issue, as theupath
above is set correctly, no?Describe how to reproduce the problem
None honestly.
Include any warning/errors/backtraces from the system logs
No logs to report.