When using check_cmd_dmesg() directly (as written in scripts/lbnl_cmd.nhc) with a negated match string, the default behavior of check_cmd_output() (which check_cmd_dmesg() wraps) used for error reporting causes the "Reason" field to contain not only the match string that was found (and shouldn't have been) but also the line number where the match was found. In the case of dmesg output, the line number is almost completely useless; moreover, it prevents Slurm and other schedulers/RMs from being able to group all the affected nodes together -- because the line numbers almost always differ!
Granted that users/admins can override the default failure message generation behavior (via -M entries, all of which are passed directly to check_cmd_output()), but in the specific case of check_cmd_dmesg(), I think the default behavior should suppress the line numbers and use a simpler, more concise message instead.
This changeset does exactly that by adding a bit of pre-processing to the command-line arguments passed to check_cmd_dmesg() before passing them on to check_cmd_output(). Each match string (-m argument) that doesn't already have a corresponding message (-M argument) to override the default will have a new default provided to it that omits the extraneous information. In other words, any -mmstr that already has a matching -Mmessage will be passed on to check_cmd_output() exactly as it is; any -mmstr that lacks a corresponding -Mmessage — or that has an emptymessage as a placeholder — will be assigned a new -Mmessage that gets passed to check_cmd_output() without any line number or other dynamic information.
When using
check_cmd_dmesg()
directly (as written inscripts/lbnl_cmd.nhc
) with a negated match string, the default behavior ofcheck_cmd_output()
(whichcheck_cmd_dmesg()
wraps) used for error reporting causes the "Reason" field to contain not only the match string that was found (and shouldn't have been) but also the line number where the match was found. In the case ofdmesg
output, the line number is almost completely useless; moreover, it prevents Slurm and other schedulers/RMs from being able to group all the affected nodes together -- because the line numbers almost always differ!Granted that users/admins can override the default failure message generation behavior (via
-M
entries, all of which are passed directly tocheck_cmd_output()
), but in the specific case ofcheck_cmd_dmesg()
, I think the default behavior should suppress the line numbers and use a simpler, more concise message instead.This changeset does exactly that by adding a bit of pre-processing to the command-line arguments passed to
check_cmd_dmesg()
before passing them on tocheck_cmd_output()
. Each match string (-m
argument) that doesn't already have a corresponding message (-M
argument) to override the default will have a new default provided to it that omits the extraneous information. In other words, any-m
mstr
that already has a matching-M
message
will be passed on tocheck_cmd_output()
exactly as it is; any-m
mstr
that lacks a corresponding-M
message
— or that has an emptymessage
as a placeholder — will be assigned a new-M
message
that gets passed tocheck_cmd_output()
without any line number or other dynamic information.Fixes #143.