stanford-rc / sasutils

Serial Attached SCSI (SAS) Linux utilities and Python library
Apache License 2.0
62 stars 17 forks source link

Intermittent missing symlinks with sas_mpath_snic_alias; possible timing issue? #29

Open OsmiumBalloon opened 3 months ago

OsmiumBalloon commented 3 months ago

Summary

Environment

Configuration

Symptoms

Investigation

Good behaviors

Problem behaviors

Workaround

Details

sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
delay = sys.argv[1]     # assuming a single argument, I hope that's right
delay = delay[3::]      # extract number out of an argument like "dm-37"
delay = int(delay)      # make sure it is an integer
delay = delay * 0.04    # add 40 millisecond delay for each additional map
delay = delay + 0.25    # minimum 250 millisecond delay
sleep(delay)
sys.exit(load_entry_point('sasutils==0.5.0', 'console_scripts', 'sas_mpath_snic_alias')())

A proper solution would likely be in the main part of the code library, but I had neither the time nor the skill to delve that deeply.

thiell commented 3 months ago

Thanks for your detailed report @OsmiumBalloon. I also noticed such issues with very large systems. I am not sure the root cause is with udev, the mpt3sas driver, the enclosure SES little brain indeed, or the kernel itself. Like you, I also noticed that udev logs are not really helpful...

I use a workaround by wrapping sas_mpath_snic_alias in a script that calls it once and retries 3 more times with a random delay when the returned alias is not what we expect. This wrapper is then called by the udev rule:

Example with:

KERNEL=="dm-[0-9]*", PROGRAM="/usr/bin/oak_udev_sas_mpath_snic_alias %k", SYMLINK+="mapper/%c"

The wrapper script /usr/bin/oak_udev_sas_mpath_snic_alias being:

#!/bin/bash

DBGFILE=/tmp/udev_sas_mpath_snic_alias.log
DEV=$1

for i in {1..4}
do
    alias=$(/usr/bin/sas_mpath_snic_alias $DEV 2>>$DBGFILE)
    if [[ $alias =~ ^io[0-9]+-jbod[1-8]-bay[0-9]+$ ]]; then           # <<< change alias regex here
        echo "$DEV: alias \"$alias\" accepted" >>$DBGFILE
        break
    else
        echo "$DEV: alias \"$alias\" not valid" >>$DBGFILE
        usleep $[ 10 + ( $RANDOM * 50 ) ]
    fi
done

echo $alias

With this wrapper, I have been able to reliably get all aliases set up a few minutes after boot time. However, it seems to be a little bit too specific to be integrated into sasutils, but let me know what you think.