Closed ZNikke closed 3 years ago
Substantial rewrite at https://github.com/openzfs/zfs/pull/11526
@arshad512 and @AeonJJohnson it sounds like a regression was accidentally introduced with the recent vdev_id rework. Would you mind taking a look at this.
Can you clarify the following?
The rebuilt vdev_id script does depend on how enclosure firmware and HBA drivers populate sysfs with values. I have seen enclosures that have physical labeling (decals, etc) on slots that are different from what the firmware developers designate the slots. Example being external labeling starting with one (Slot 1) but that same slot is designated as 0. Zero start in firmware and One start in external visual labeling.
If it was working and now is not, something has surely changed.
Can you post the output of the following command, changing the argument to reflect a drive that is not reporting correctly?
sh -x /usr/lib/udev/vdev_id -d sdX (where X equals the sd device instance not correctly reporting)
Be sure to include what you think that drive should be processed by vdev_id as.
This package is ideal for answering these questions, https://pypi.org/project/sasutils/
Can you clarify the following?
1. Model of HBA (specify which HBA is in host that works and which HBA in host where it's not working)
The mapping behaviour switches between work (zfs-0.8.3) and not-work (zfs-2.1.0rc4) on the same host, so I don't think it's at all hardware dependant. These particular HBA:s are LSI 9212-4i4e IIRC.
2. Model of enclosure (same as above, which model works and which model does not)
As above, behavior changes between zfs version and reboot, currently connected is a HP D2600 and a HP DL180G6 converted to be a disk enclosure.
3. Last known version of ZFS where enclosure/drive enumeration and representation was correct
I have only checked the Ubuntu 20.04 build of zfs 0.8.3 (which works) and 2.1.0rc4 that doesn't.
The rebuilt vdev_id script does depend on how enclosure firmware and HBA drivers populate sysfs with values. I have seen enclosures that have physical labeling (decals, etc) on slots that are different from what the firmware developers designate the slots. Example being external labeling starting with one (Slot 1) but that same slot is designated as 0. Zero start in firmware and One start in external visual labeling.
Yup, enclosures map ID:s all over the place and it gets worse when you have internal HBA:s with individual connectors that are not in the same order as the ports... That's why we need the slot mapping to be able to bring order to the madness.
If it was working and now is not, something has surely changed.
Can you post the output of the following command, changing the argument to reflect a drive that is not reporting correctly?
sh -x /usr/lib/udev/vdev_id -d sdX (where X equals the sd device instance not correctly reporting)
# ls -l /dev/disk/by-vdev/c3bay4
lrwxrwxrwx 1 root root 9 Apr 27 13:46 /dev/disk/by-vdev/c3bay4 -> ../../sds
# sh -x /usr/lib/udev/vdev_id -d sds
+ PATH=/bin:/sbin:/usr/bin:/usr/sbin
+ CONFIG=/etc/zfs/vdev_id.conf
+ PHYS_PER_PORT=
+ DEV=
+ TOPOLOGY=
+ BAY=
+ ENCL_ID=
+ UNIQ_ENCL_ID=
+ getopts c:d:eg:jmp:h OPTION
+ DEV=sds
+ getopts c:d:eg:jmp:h OPTION
+ [ ! -r /etc/zfs/vdev_id.conf ]
+ [ -z sds ]
+ [ -z ]
+ awk ($1 == "topology") {print $2; exit} /etc/zfs/vdev_id.conf
+ TOPOLOGY=sas_direct
+ [ -z ]
+ awk ($1 == "slot") {print $2; exit} /etc/zfs/vdev_id.conf
+ BAY=bay
+ TOPOLOGY=sas_direct
+ [ = yes ]
+ alias_handler
+ DM_PART=
+ echo
+ grep -q -E p[0-9][0-9]*$
+ ID_VDEV=
+ [ -z ]
+ BAY=bay
+ sas_handler
+ [ -z ]
+ awk $1 == "phys_per_port" \
{print $2; exit} /etc/zfs/vdev_id.conf
+ PHYS_PER_PORT=4
+ PHYS_PER_PORT=4
+ echo 4
+ grep -q -E ^[0-9]+$
+ [ -z ]
+ awk $1 == "multipath" \
{print $2; exit} /etc/zfs/vdev_id.conf
+ MULTIPATH_MODE=no
+ [ -z ]
+ awk $1 == "multijbod" \
{print $2; exit} /etc/zfs/vdev_id.conf
+ MULTIJBOD_MODE=
+ [ no = yes ]
+ echo sds
+ grep -q ^/devices/
+ udevadm info -q path -p /sys/block/sds
+ sys_path=/devices/pci0000:00/0000:00:03.0/0000:1b:00.0/host1/port-1:4/end_device-1:4/target1:0:16/1:0:16:0/block/sds
+ echo /devices/pci0000:00/0000:00:03.0/0000:1b:00.0/host1/port-1:4/end_device-1:4/target1:0:16/1:0:16:0/block/sds
+ tr /
+ set -- devices pci0000:00 0000:00:03.0 0000:1b:00.0 host1 port-1:4 end_device-1:4 target1:0:16 1:0:16:0 block sds
+ num_dirs=11
+ scsi_host_dir=/sys
+ i=1
+ [ 1 -le 11 ]
+ eval echo ${1}
+ echo devices
+ d=devices
+ scsi_host_dir=/sys/devices
+ echo devices
+ grep -q -E ^host[0-9]+$
+ i=2
+ [ 2 -le 11 ]
+ eval echo ${2}
+ echo pci0000:00
+ d=pci0000:00
+ scsi_host_dir=/sys/devices/pci0000:00
+ echo pci0000:00
+ grep -q -E ^host[0-9]+$
+ i=3
+ [ 3 -le 11 ]
+ eval echo ${3}
+ echo 0000:00:03.0
+ d=0000:00:03.0
+ scsi_host_dir=/sys/devices/pci0000:00/0000:00:03.0
+ echo 0000:00:03.0
+ grep -q -E ^host[0-9]+$
+ i=4
+ [ 4 -le 11 ]
+ eval echo ${4}
+ echo 0000:1b:00.0
+ d=0000:1b:00.0
+ scsi_host_dir=/sys/devices/pci0000:00/0000:00:03.0/0000:1b:00.0
+ echo 0000:1b:00.0
+ grep -q -E ^host[0-9]+$
+ i=5
+ [ 5 -le 11 ]
+ eval echo ${5}
+ echo host1
+ d=host1
+ scsi_host_dir=/sys/devices/pci0000:00/0000:00:03.0/0000:1b:00.0/host1
+ echo host1
+ grep -q -E ^host[0-9]+$
+ break
+ echo host1
+ awk -F/ { gsub("host","",$NF); print $NF}
+ HOSTCHAN=1
+ [ 5 = 11 ]
+ eval echo ${4}
+ echo+ 0000:1b:00.0
awk -F: {print $2":"$3}
+ PCI_ID=1b:00.0
+ port_dir=/sys/devices/pci0000:00/0000:00:03.0/0000:1b:00.0/host1
+ j=6
+ i=6
+ [ 6 -le 6 ]
+ eval echo ${6}
+ echo port-1:4
+ port_dir=/sys/devices/pci0000:00/0000:00:03.0/0000:1b:00.0/host1/port-1:4
+ i=7
+ [ 7 -le 6 ]
+ + ls -d /sys/devices/pci0000:00/0000:00:03.0/0000:1b:00.0/host1/port-1:4/phy-1:4
head -1
+ awk -F: {print $NF}
+ PHY=4
+ [ -z 4 ]
+ PORT=1
+ end_device_dir=/sys/devices/pci0000:00/0000:00:03.0/0000:1b:00.0/host1/port-1:4
+ [ 7 -lt 11 ]
+ eval echo ${7}
+ echo end_device-1:4
+ d=end_device-1:4
+ end_device_dir=/sys/devices/pci0000:00/0000:00:03.0/0000:1b:00.0/host1/port-1:4/end_device-1:4
+ echo end_device-1:4
+ grep -q ^end_device
+ end_device_dir=/sys/devices/pci0000:00/0000:00:03.0/0000:1b:00.0/host1/port-1:4/end_device-1:4/sas_device/end_device-1:4
+ break
+ SLOT=
+ cat /sys/devices/pci0000:00/0000:00:03.0/0000:1b:00.0/host1/port-1:4/end_device-1:4/sas_device/end_device-1:4/bay_identifier
+ SLOT=4
+ [ -z 4 ]
+ [ = yes ]
+ map_channel 1b:00.0 1
+ MAPPED_CHAN=
+ PCI_ID=1b:00.0
+ PORT=1
+ awk -v pciID=1b:00.0 -v port=1 $1 == "channel" && $2 == pciID && $3 == port \
{print $4} /etc/zfs/vdev_id.conf
+ MAPPED_CHAN=c3bay
+ printf %s c3bay
+ CHAN=c3bay
+ map_slot 4 c3bay
+ LINUX_SLOT=4
+ CHANNEL=c3bay
+ awk $1 == "slot" && $2 == "${LINUX_SLOT}" && \
$4 ~ /^${CHANNEL}$|^$/ { print $3; exit} /etc/zfs/vdev_id.conf
+ MAPPED_SLOT=
+ [ -z ]
+ MAPPED_SLOT=4
+ printf %d 4
+ SLOT=4
+ [ -z c3bay ]
+ echo c3bay4
+ ID_VDEV=c3bay4
+ [ -n c3bay4 ]
+ echo ID_VDEV=c3bay4
ID_VDEV=c3bay4
+ echo ID_VDEV_PATH=disk/by-vdev/c3bay4
ID_VDEV_PATH=disk/by-vdev/c3bay4
Be sure to include what you think that drive should be processed by vdev_id as.
The vdev_id.conf line:
slot 4 8 c3bay
should trigger and map this drive as c3bay8
- right now this is obviously not triggered as MAPPED_SLOT
is empty above.
From a quick look it can be too "hard" quoting, so the LINUX_SLOT etc variables aren't expanded as they should in the awk command line?
@ZNikke , Thanks for the debug output.
should trigger and map this drive as c3bay8 - right now this is obviously not triggered as MAPPED_SLOT is empty above.
Right
From a quick look it can be too "hard" quoting, so the LINUX_SLOT etc variables aren't expanded as they should in the awk command line?
I will update you on this. (However, this should not be the case)
@ZNikke
I will update you on this. (However, this should not be the case)
You are correct. When the code was moved form ``(back-ticks command subsitution) to $() version the expansion was not handled. I will also skim thourgh rest of code if any other place this change is required. I will update the patch with the changes.
@ZNikke, Could you please try patch https://github.com/openzfs/zfs/pull/11959 and let us know.
Progress, now it processes the per-channel mappings but still not the default/fallback mapping, ie enc10 in our config:
# ls --hide="*-part*" /dev/disk/by-vdev/
c0v1@ c4bay12@ enc10d1@ enc10d4@ enc10d9@ enc9d2@ enc9d7@
c3bay10@ c4bay13@ enc10d10@ enc10d5@ enc9d1@ enc9d3@ enc9d8@
c3bay11@ c4bay14@ enc10d11@ enc10d6@ enc9d10@ enc9d4@ enc9d9@
c3bay8@ c4bay15@ enc10d2@ enc10d7@ enc9d11@ enc9d5@
c3bay9@ enc10d0@ enc10d3@ enc10d8@ enc9d12@ enc9d6@
Debug output on one of the enc10 disks:
# ls -la /dev/disk/by-vdev/enc10d0
lrwxrwxrwx 1 root root 9 Apr 28 10:07 /dev/disk/by-vdev/enc10d0 -> ../../sdx
# sh -x /usr/lib/udev/vdev_id -d sdx
+ PATH=/bin:/sbin:/usr/bin:/usr/sbin
+ CONFIG=/etc/zfs/vdev_id.conf
+ PHYS_PER_PORT=
+ DEV=
+ TOPOLOGY=
+ BAY=
+ ENCL_ID=
+ UNIQ_ENCL_ID=
+ getopts c:d:eg:jmp:h OPTION
+ DEV=sdx
+ getopts c:d:eg:jmp:h OPTION
+ [ ! -r /etc/zfs/vdev_id.conf ]
+ [ -z sdx ]
+ [ -z ]
+ awk ($1 == "topology") {print $2; exit} /etc/zfs/vdev_id.conf
+ TOPOLOGY=sas_direct
+ [ -z ]
+ awk ($1 == "slot") {print $2; exit} /etc/zfs/vdev_id.conf
+ BAY=bay
+ TOPOLOGY=sas_direct
+ [ = yes ]
+ alias_handler
+ DM_PART=
+ echo
+ grep -q -E p[0-9][0-9]*$
+ ID_VDEV=
+ [ -z ]
+ BAY=bay
+ sas_handler
+ [ -z ]
+ awk $1 == "phys_per_port" \
{print $2; exit} /etc/zfs/vdev_id.conf
+ PHYS_PER_PORT=4
+ PHYS_PER_PORT=4
+ echo 4
+ grep -q -E ^[0-9]+$
+ [ -z ]
+ awk $1 == "multipath" \
{print $2; exit} /etc/zfs/vdev_id.conf
+ MULTIPATH_MODE=no
+ [ -z ]
+ awk $1 == "multijbod" \
{print $2; exit} /etc/zfs/vdev_id.conf
+ MULTIJBOD_MODE=
+ [ no = yes ]
+ echo sdx
+ grep -q ^/devices/
+ udevadm info -q path -p /sys/block/sdx
+ sys_path=/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/expander-2:0/port-2:0:0/end_device-2:0:0/target2:0:4/2:0:4:0/block/sdx
+ echo /devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/expander-2:0/port-2:0:0/end_device-2:0:0/target2:0:4/2:0:4:0/block/sdx
+ tr /
+ set -- devices pci0000:00 0000:00:03.2 0000:20:00.0 host2 port-2:4 expander-2:0 port-2:0:0 end_device-2:0:0 target2:0:4 2:0:4:0 block sdx
+ num_dirs=13
+ scsi_host_dir=/sys
+ i=1
+ [ 1 -le 13 ]
+ eval echo ${1}
+ echo devices
+ d=devices
+ scsi_host_dir=/sys/devices
+ echo devices
+ grep -q -E ^host[0-9]+$
+ i=2
+ [ 2 -le 13 ]
+ eval echo ${2}
+ echo pci0000:00
+ d=pci0000:00
+ scsi_host_dir=/sys/devices/pci0000:00
+ echo pci0000:00
+ grep -q -E ^host[0-9]+$
+ i=3
+ [ 3 -le 13 ]
+ eval echo ${3}
+ echo 0000:00:03.2
+ d=0000:00:03.2
+ scsi_host_dir=/sys/devices/pci0000:00/0000:00:03.2
+ echo 0000:00:03.2
+ grep -q -E ^host[0-9]+$
+ i=4
+ [ 4 -le 13 ]
+ eval echo ${4}
+ echo 0000:20:00.0
+ d=0000:20:00.0
+ scsi_host_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0
+ echo 0000:20:00.0
+ grep -q -E ^host[0-9]+$
+ i=5
+ [ 5 -le 13 ]
+ eval echo ${5}
+ echo host2
+ d=host2
+ scsi_host_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2
+ echo host2
+ grep -q -E ^host[0-9]+$
+ break
+ echo host2
+ awk -F/ { gsub("host","",$NF); print $NF}
+ HOSTCHAN=2
+ [ 5 = 13 ]
+ eval echo ${4}
+ echo 0000:20:00.0
+ awk -F: {print $2":"$3}
+ PCI_ID=20:00.0
+ port_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2
+ j=6
+ i=6
+ [ 6 -le 6 ]
+ eval echo ${6}
+ echo port-2:4
+ port_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4
+ i=7
+ [ 7 -le 6 ]
+ + ls -d /sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/phy-2:0 /sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/phy-2:1head /sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/phy-2:2 -1 /sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/phy-2:3
+ awk -F: {print $NF}
+ PHY=0
+ [ -z 0 ]
+ PORT=0
+ end_device_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4
+ [ 7 -lt 13 ]
+ eval echo ${7}
+ echo expander-2:0
+ d=expander-2:0
+ end_device_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/expander-2:0
+ echo expander-2:0
+ grep -q ^end_device
+ i=8
+ [ 8 -lt 13 ]
+ eval echo ${8}
+ echo port-2:0:0
+ d=port-2:0:0
+ end_device_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/expander-2:0/port-2:0:0
+ echo port-2:0:0
+ grep -q ^end_device
+ i=9
+ [ 9 -lt 13 ]
+ eval echo ${9}
+ echo end_device-2:0:0
+ d=end_device-2:0:0
+ end_device_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/expander-2:0/port-2:0:0/end_device-2:0:0
+ echo end_device-2:0:0
+ grep -q ^end_device
+ end_device_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/expander-2:0/port-2:0:0/end_device-2:0:0/sas_device/end_device-2:0:0
+ break
+ SLOT=
+ cat /sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/expander-2:0/port-2:0:0/end_device-2:0:0/sas_device/end_device-2:0:0/bay_identifier
+ SLOT=0
+ [ -z 0 ]
+ [ = yes ]
+ map_channel 20:00.0 0
+ MAPPED_CHAN=
+ PCI_ID=20:00.0
+ PORT=0
+ awk -v pciID=20:00.0 -v port=0 $1 == "channel" && $2 == pciID && $3 == port \
{print $4} /etc/zfs/vdev_id.conf
+ MAPPED_CHAN=enc10d
+ printf %s enc10d
+ CHAN=enc10d
+ map_slot 0 enc10d
+ LINUX_SLOT=0
+ CHANNEL=enc10d
+ awk -v linux_slot=0 -v channel=enc10d $1 == "slot" && $2 == linux_slot && \
$4 ~ "^"channel"$" { print $3; exit} /etc/zfs/vdev_id.conf
+ MAPPED_SLOT=
+ [ -z ]
+ MAPPED_SLOT=0
+ printf %d 0
+ SLOT=0
+ [ -z enc10d ]
+ echo enc10d0
+ ID_VDEV=enc10d0
+ [ -n enc10d0 ]
+ echo ID_VDEV=enc10d0
ID_VDEV=enc10d0
+ echo ID_VDEV_PATH=disk/by-vdev/enc10d0
ID_VDEV_PATH=disk/by-vdev/enc10d0
For this drive I would have expected this vdev_id.conf
line to trigger:
slot 0 1
and map this drive to enc10d1
.
@ZNikke , thanks for trying out the patch. For the fallback mapping I will update the patch in sometime.
@ZNikke , thanks for trying out the patch. For the fallback mapping I will update the patch in sometime.
@ZNikke, please try out the revised patch https://github.com/openzfs/zfs/pull/11959 and let us know the result. Thanks!
I'll give this a test on our enclosures
With the revised patch the result is as expected:
# ls --hide="*-part*" /dev/disk/by-vdev/
c0v1 c3bay9 c4bay15 enc10d12 enc10d5 enc10d9 enc9d12 enc9d5 enc9d9
c3bay10 c4bay12 enc10d1 enc10d2 enc10d6 enc9d1 enc9d2 enc9d6
c3bay11 c4bay13 enc10d10 enc10d3 enc10d7 enc9d10 enc9d3 enc9d7
c3bay8 c4bay14 enc10d11 enc10d4 enc10d8 enc9d11 enc9d4 enc9d8
However, if I comment out the slot bay
line in vdev_id.conf
I only get c0v1
mapped in by-vdev/
. This should mean that the file vdev_id.conf.sas_direct.example
as shipped doesn't actually work. I think the best way forward would be to fix this bug too, as slot bay
should be the default according to the vdev_id.conf
man page.
With #slot bay
in vdev_id.conf
a debug run gives:
# sh -x /usr/lib/udev/vdev_id -d sdx
+ PATH=/bin:/sbin:/usr/bin:/usr/sbin
+ CONFIG=/etc/zfs/vdev_id.conf
+ PHYS_PER_PORT=
+ DEV=
+ TOPOLOGY=
+ BAY=
+ ENCL_ID=
+ UNIQ_ENCL_ID=
+ getopts c:d:eg:jmp:h OPTION
+ DEV=sdx
+ getopts c:d:eg:jmp:h OPTION
+ [ ! -r /etc/zfs/vdev_id.conf ]
+ [ -z sdx ]
+ [ -z ]
+ awk ($1 == "topology") {print $2; exit} /etc/zfs/vdev_id.conf
+ TOPOLOGY=sas_direct
+ [ -z ]
+ awk ($1 == "slot") {print $2; exit} /etc/zfs/vdev_id.conf
+ BAY=4
+ TOPOLOGY=sas_direct
+ [ = yes ]
+ alias_handler
+ DM_PART=
+ echo
+ grep -q -E p[0-9][0-9]*$
+ ID_VDEV=
+ [ -z ]
+ BAY=4
+ sas_handler
+ [ -z ]
+ awk $1 == "phys_per_port" \
{print $2; exit} /etc/zfs/vdev_id.conf
+ PHYS_PER_PORT=4
+ PHYS_PER_PORT=4
+ echo 4
+ grep -q -E ^[0-9]+$
+ [ -z ]
+ awk $1 == "multipath" \
{print $2; exit} /etc/zfs/vdev_id.conf
+ MULTIPATH_MODE=no
+ [ -z ]
+ awk $1 == "multijbod" \
{print $2; exit} /etc/zfs/vdev_id.conf
+ MULTIJBOD_MODE=
+ [ no = yes ]
+ echo sdx
+ grep -q ^/devices/
+ udevadm info -q path -p /sys/block/sdx
+ sys_path=/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/expander-2:0/port-2:0:0/end_device-2:0:0/target2:0:4/2:0:4:0/block/sdx
+ echo /devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/expander-2:0/port-2:0:0/end_device-2:0:0/target2:0:4/2:0:4:0/block/sdx
+ tr /
+ set -- devices pci0000:00 0000:00:03.2 0000:20:00.0 host2 port-2:4 expander-2:0 port-2:0:0 end_device-2:0:0 target2:0:4 2:0:4:0 block sdx
+ num_dirs=13
+ scsi_host_dir=/sys
+ i=1
+ [ 1 -le 13 ]
+ eval echo ${1}
+ echo devices
+ d=devices
+ scsi_host_dir=/sys/devices
+ echo devices
+ grep -q -E ^host[0-9]+$
+ i=2
+ [ 2 -le 13 ]
+ eval echo ${2}
+ echo pci0000:00
+ d=pci0000:00
+ scsi_host_dir=/sys/devices/pci0000:00
+ echo pci0000:00
+ grep -q -E ^host[0-9]+$
+ i=3
+ [ 3 -le 13 ]
+ eval echo ${3}
+ echo 0000:00:03.2
+ d=0000:00:03.2
+ scsi_host_dir=/sys/devices/pci0000:00/0000:00:03.2
+ echo 0000:00:03.2
+ grep -q -E ^host[0-9]+$
+ i=4
+ [ 4 -le 13 ]
+ eval echo ${4}
+ echo 0000:20:00.0
+ d=0000:20:00.0
+ scsi_host_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0
+ echo 0000:20:00.0
+ grep -q -E ^host[0-9]+$
+ i=5
+ [ 5 -le 13 ]
+ eval echo ${5}
+ echo host2
+ d=host2
+ scsi_host_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2
+ echo host2
+ grep -q -E ^host[0-9]+$
+ break
+ echo host2
+ awk -F/ { gsub("host","",$NF); print $NF}
+ HOSTCHAN=2
+ [ 5 = 13 ]
+ eval echo ${4}
+ echo 0000:20:00.0
+ awk -F: {print $2":"$3}
+ PCI_ID=20:00.0
+ port_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2
+ j=6
+ i=6
+ [ 6 -le 6 ]
+ eval echo ${6}
+ echo port-2:4
+ port_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4
+ i=7
+ [ 7 -le 6 ]
+ ls -d /sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/phy-2:0 /sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/phy-2:1+ /sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/phy-2:2 /sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/phy-2:3
head -1
+ awk -F: {print $NF}
+ PHY=0
+ [ -z 0 ]
+ PORT=0
+ end_device_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4
+ [ 7 -lt 13 ]
+ eval echo ${7}
+ echo expander-2:0
+ d=expander-2:0
+ end_device_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/expander-2:0
+ echo expander-2:0
+ grep -q ^end_device
+ i=8
+ [ 8 -lt 13 ]
+ eval echo ${8}
+ echo port-2:0:0
+ d=port-2:0:0
+ end_device_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/expander-2:0/port-2:0:0
+ echo port-2:0:0
+ grep -q ^end_device
+ i=9
+ [ 9 -lt 13 ]
+ eval echo ${9}
+ echo end_device-2:0:0
+ d=end_device-2:0:0
+ end_device_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/expander-2:0/port-2:0:0/end_device-2:0:0
+ echo end_device-2:0:0
+ grep -q ^end_device
+ end_device_dir=/sys/devices/pci0000:00/0000:00:03.2/0000:20:00.0/host2/port-2:4/expander-2:0/port-2:0:0/end_device-2:0:0/sas_device/end_device-2:0:0
+ break
+ SLOT=
+ [ -z ]
+ return
+ ID_VDEV=
+ [ -n ]
...
@ZNikke ,
With the revised patch the result is as expected:
Thank you for confirming.
However, if I comment out the slot bay line in vdev_id.conf I only get c0v1 mapped in by-vdev/
The reason for this is that when you comment out the "slot bay" line under vdev_id.conf instead of picking up the default which is 'bay' it picks up the first line which is defined under 'mappings' sections. From your example debug run you posted...
awk ($1 == "slot") {print $2; exit} /etc/zfs/vdev_id.conf
BAY=4
The '4' comes from the first line which is matching 'slot' once 'slot bay' is removed.
# Drives in internal bays
slot 4 8 c3bay
This is clearly incorrect which leads to wrong values we are getting.
The question is did it ever run correctly? The code changes under https://github.com/jsai20/zfs/commit/1f9db7f5936b6c4c4a1f3f17bfff3e86789e2033 did not change this part and this should be failing under old verison also. Could you please confirm if this worked in older version or not? If not then we should still be fixing this ofcourse but in a new bug and leave this for the original issue you filed which was a regression.
Second, if we look at the man page the "bay" is optional and taken as default. However the key (slot) should be present. IMHO, the usage of removing the full line of "slot bay" is incorrect. I am commenting purely from the code I read. I will let other folks comment on this if this is indeed a bug which needs to be fixed or not ?. (@AeonJJohnson , @gdevenyi , @tonyhutter , @behlendorf )
slot <bay|phy|id|lun>
Specifies from which element of a SAS identifier the slot number is taken. The default is bay.
I'm pretty sure this has been broken for quite some time, I have vivid memories of fighting with vdev_id.conf
for over a week without understanding how it could ever work until I restarted from scratch cut&pasting an example with the defaults explicitly listed.
In any case, either the bug needs to be fixed or the shipped vdev_id.conf.sas_direct.example
and vdev_id.conf
man page needs to be updated to conform to the script behavior.
I'm guessing fixing the script is less work (should be as easy as only accepting a line which is valid for the slot <bay|phy|id|lun> spec for the assignment).
If it should be done within this issue or in a separate issue is another question I guess.
I'm pretty sure this has been broken for quite some time
Understood. Than, I would say, this is expected behavior (as per the code). This is not part of regression.
In any case, either the bug needs to be fixed or the shipped
vdev_id.conf.sas_direct.example
andvdev_id.conf
man page needs to be updated to conform to the script behavior.
As per the code it expects 'slot' with 'bay' as default. So, man page sould be updated if that is confusing for users.
I'm guessing fixing the script is less work (should be as easy as only accepting a line which is valid for the slot <bay|phy|id|lun> spec for the assignment).
The code assumes 'bay' as default.
if [ -z "$ID_VDEV" ] ; then
BAY=${BAY:-bay}
If we change the behavior then we might end up breaking few cases where only 'slot' is mentioned by user. It would be good to update the man page. IMO.
I think we may be going into creeping elegance territory here. The original complaint root cause was identified and fixed. We should avoid fine tuning to a single use case.
One of the challenges with this entire subsystem is that the SES enclosure standard is a loose standard and how it is applied by various platform, hardware and enclosure manufacturers varies. The slot bay function isn't broken, it just might not work correctly on some hardware. If you take a look at the sysfs enclosure tree on your system and see what enclosure pointers exist under the end_device subdir you will get an idea of whether bay, phy, id, ses or mix will work best for your specific hardware. This works fine for the megascale "a list" hardware like HGST, Quanta, Newisys and Netapp. Others, based on their firmware robustness, can be less so. Adaptec HBAs for example make strange representations in sysfs, LSI based HBAs run flawless (as long as you don't have their RAID 0/1/1E "IR" firmware loaded.
If you descend into the sysfs tree under /sys/class/sas_host you'll find the enclosure device subdir for each disk (after several subdir levels) that contains the SES pointers the script pulls data from. In the case below (HGST UltraData60 JBOD) you'll see the slot special file that contains the numeric id for the enclosure slot number:
[root@localhost 10:0:25:0]# ls enclosure_device\:SLOT\ 07\,8HJ2WHZH\ \ \ \ \ \ \ \ \ \ \ \ /
active device fault locate power power_status slot status type uevent
[root@localhost 10:0:25:0]# cat enclosure_device\:SLOT\ 07\,8HJ2WHZH\ \ \ \ \ \ \ \ \ \ \ \ /slot
7
Based upon a specific combination of HBA and enclosure one of the other settings may work better. With this newer version of vdev_id I added the "mix" use case because of a hardware situation I discovered where a "slot compliant" JBOD may be connected to a host with directly connected disks inside of it, direct cable to slot with no SAS expander or SES function in the middle. This gave vdev_id the ability to enumerate disks both in a SES JBOD as well as directly connected.
Maybe polishing some of what I said above and adding it to the man page may be a better path than tweaking vdev_id for site specific use cases.
I agree that the original issue is fixed.
However, for the "slot bay" thing, my main point is that the shipped example file won't work as is, and the man page is confusing, but I'll readily agree that it's likely best handled in a separate issue :)
Although I have to add that the mapping code is really not about dealing with the hardware, as it's performed after that stage to deal with whatever the results of the hardware-dealing was to map ID:s into the logic the user wants (based on human preferences such as naming conventions and matching ID:s to what's printed on boxes etc).
And btw, the slot mix
mode is neither mentioned in the vdev_id.conf
man page nor in an example file. It is indeed documented in vdev_id
itself, but it's not obvious to look there since there is a man page, so getting the man page up to speed sounds like a good idea indeed!
The double-use of the slot
keyword is confusing, but that's also another debate...
Thanks everybody for the quick fix and all the testing. I'll go ahead and get it merged and the documentation updates can be handled in a new PR.
System information
Describe the problem you're observing
It looks to me that slot remapping has stopped working sometime between zfs-0.8.3 (actually 0.8.3-1ubuntu12.8) and zfs-2.1.0-rc4.
Our disks ends up at different id:s depending on enclosure model, none of them matching the physical 1-12 numbering on the enclosure, so we want to map them all to match the 1-12 numbering.
Our previously working
vdev_id.conf
seems to still map the channels, but the disks ends up with whatever slot the enclosures use. As the documentation and examples forvdev_id.conf
suggests this should still work, I think it's a regression.Describe how to reproduce the problem
Given this
vdev_id.conf
:We get this mapping:
Ie no slot remapping was done! The expected result of the slot mapping would be for
c3bay c4bay
to have the disks in the 8 .. 15 sequence, and that the enc9/enc10 enclosures to have the disks in the 1 .. 12 sequence.Things get extra interesting if I remove the
slot bay
assignment, then I only getc0v1
mapped inby-vdev/
. Thevdev_id.conf
man page says thatbay
is the default, so it really shouldn't be needed to specify it. On this subject it's interesting to note thatvdev_id.conf.sas_direct.example
doesn't mentionslot bay
, but but the example in thevdev_id.conf
man page do. I personally think that the double-use of theslot
keyword is a mistake, it's very confusing...Include any warning/errors/backtraces from the system logs