mej / nhc

LBNL Node Health Check
Other
213 stars 78 forks source link

Default output to stdout/stderr when the -e option is used #123

Open mej opened 1 year ago

mej commented 1 year ago

(Background) Because it was originally written to interface with resource managers like Slurm and TORQUE, NHC's output is always directed to a log file. (In fact, TORQUE relies on the first few characters of output from the health check command to determine success or failure, so any extraneous output could cause NHC's results to be misinterpreted or mishandled.) The configuration variable LOGFILE controls what specific target should receive the output from NHC. It's typically set to a path+filename somewhere under /var/log (which also means that the default only works for root).

NHC recently acquired the ability to "test drive" a check line by supplying it to nhc via the -e option. I expect, and I think most users would expect, the output of something that's inherently interactive to default to the same terminal/pty device in which I ran nhc -e ... in the first place. At present, though, no special handling is performed, so the default output destination when directly running a check at the shell prompt is identical to that of a run from slurmd, cron, etc. The current behavior is counterintuitive, at least to me.

Moreover, I've seen more than 1 user run nhc -e ... and get hopelessly confused by what they saw (or didn't). Confused customers are cranky customers.

So let's change the default for -l/LOGFILE to stdout/stderr when the -e argument is used, essentially making nhc -l - the "interactive default" behavior.