Open OsmiumBalloon opened 3 months ago
Thanks for your detailed report @OsmiumBalloon. I also noticed such issues with very large systems. I am not sure the root cause is with udev, the mpt3sas driver, the enclosure SES little brain indeed, or the kernel itself. Like you, I also noticed that udev logs are not really helpful...
I use a workaround by wrapping sas_mpath_snic_alias
in a script that calls it once and retries 3 more times with a random delay when the returned alias is not what we expect. This wrapper is then called by the udev rule:
Example with:
KERNEL=="dm-[0-9]*", PROGRAM="/usr/bin/oak_udev_sas_mpath_snic_alias %k", SYMLINK+="mapper/%c"
The wrapper script /usr/bin/oak_udev_sas_mpath_snic_alias
being:
#!/bin/bash
DBGFILE=/tmp/udev_sas_mpath_snic_alias.log
DEV=$1
for i in {1..4}
do
alias=$(/usr/bin/sas_mpath_snic_alias $DEV 2>>$DBGFILE)
if [[ $alias =~ ^io[0-9]+-jbod[1-8]-bay[0-9]+$ ]]; then # <<< change alias regex here
echo "$DEV: alias \"$alias\" accepted" >>$DBGFILE
break
else
echo "$DEV: alias \"$alias\" not valid" >>$DBGFILE
usleep $[ 10 + ( $RANDOM * 50 ) ]
fi
done
echo $alias
With this wrapper, I have been able to reliably get all aliases set up a few minutes after boot time. However, it seems to be a little bit too specific to be integrated into sasutils, but let me know what you think.
Summary
sas_mpath_snic_alias
script in udev rulessas_mpath_snic_alias
seems to alleviate the problemEnvironment
9500-16e
ST20000NM002D
disksCSE-847E2C-R1K23JBOD
enclosures (w/ redundant expanders)6.1.0-21-amd64
/6.1.90-1
(2024-05-03)3.11.2
0.9.4-3+deb12u1
0.5.0
Configuration
/etc/multipath.conf
says in part:user_friendly_names no
find_multipaths yes
path_grouping_policy multibus
/etc/udev/rules.d/sasutils.rules
says:KERNEL=="dm-[0-9]*", PROGRAM="/usr/local/bin/sas_mpath_snic_alias_delayed %k", SYMLINK+="mapper/%c"
sg_ses
has been used to assign nicknames to the enclosures, such as:SHLF_1_FRNT_PRI
(disk shelf 1, front backplane, primary expander)SHLF_1_FRNT_SEC
(disk shelf 1, front backplane, secondary expander)SHLF_1_REAR_PRI
(disk shelf 1, rear backplane, primary expander)SHLF_2_FRNT_PRI
(disk shelf 2, front backplane, primary expander)Symptoms
/dev/mapper/SHLF_1_FRNT-bay00
to appear for every physical diskInvestigation
Good behaviors
/dev/sd*
)/dev/mapper/35000000000000000
symlinks always appear for all diskssas_devices -v
has always reported all devices and enclosures, with proper slotslsscsi
has always reported all devicesmultipath -l
has always reported all devices, with two disks per mapProblem behaviors
/dev/mapper/SHLF_1_FRNT-bay00
missing several timesudevadm trigger
a few times; it has always caused the missing nodes to appearsas_sd_snic_alias
SHLF_1_FRNT_PRI-bay00
/dev/disk/by-bay/naa.5000000000000000-bay09
udev_log
todebug
in/etc/udev/udev.conf
but the results have not been particularly illuminating/etc/udev/rules.d/sasutils.rules:11 Command "/usr/local/bin/sas_mpath_snic_alias_delayed dm-0" returned 1 (error)
dm-
devices even appear in the log, even when everything is working (???)Workaround
sas_mpath_snic_alias
seems to have alleviated the problemDetails
sas_mpath_snic_alias
file itselffrom time import sleep
near topdm-NN
number passed as first argument, aftersys.argv
processing, but beforeload_entry_point
, as follows:A proper solution would likely be in the main part of the code library, but I had neither the time nor the skill to delve that deeply.