mej / nhc

LBNL Node Health Check
Other
220 stars 80 forks source link

sshd check in redhat 9.X fails even though sshd is running #151

Open kdalenberg opened 3 weeks ago

kdalenberg commented 3 weeks ago

The standard check: * || check_ps_service -u root -S sshd

Fails in redhat 9 when sshd service is enabled and running. Debug shows:

[1724788959] - DEBUG: Checking 67117: "sshd" vs. "sshd:" [1724788959] - DEBUG: Glob match check: sshd: does not match sshd

mej commented 1 week ago

The check_ps_service() check has an option, -m, which allows you to specify your own match string in lieu of the default behavior (which is to match any command whose argv[0] ends with the name of the specified service -- hence the *sshd it's using in your example above).

You might try check_ps_service -u root -m 'sshd:' -S sshd or check_ps_service -u root -m '/^sshd:?$/' -S sshd

griznog commented 1 week ago

I've started doing all my service checks via systemctl, e.g.

Having a dozen of these doesn't seem to make a meaningful difference to how long my check runs. I've also been thinking about writing a custom health check function that takes a list of services and checks them all with one call to systemctl if/when it does become an issue to call systemctl many times.

griznog

On Tue, Aug 27, 2024 at 3:06 PM Ken Dalenberg @.***> wrote:

The standard check: * || check_ps_service -u root -S sshd

Fails in redhat 9 when sshd service is enabled and running. Debug shows:

[1724788959] - DEBUG: Checking 67117: "sshd" vs. "sshd:" [1724788959] - DEBUG: Glob match check: sshd: does not match sshd

— Reply to this email directly, view it on GitHub https://github.com/mej/nhc/issues/151, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB4PKT7CWAQAKHCFYOJOQ3ZTTL4HAVCNFSM6AAAAABNG27VUCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ4TAMRSHAYTCMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

kdalenberg commented 1 week ago

This is the string that got things working ok in redhat 9:

check_ps_service -u root -d sshd: -S sshd

KEN DALENBERG Linux System Administrator Office of Advanced Research Computing Rutgers, The State University of New Jersey Busch Campus, CoRE Building, 96 Frelinghuysen Road, Piscataway, NJ 08854 @.*** 848-445-5248


From: griznog @.> Sent: Sunday, September 8, 2024 10:55 AM To: mej/nhc @.> Cc: Kenneth Dalenberg @.>; Author @.> Subject: Re: [mej/nhc] sshd check in redhat 9.X fails even though sshd is running (Issue #151)

I've started doing all my service checks via systemctl, e.g.

Having a dozen of these doesn't seem to make a meaningful difference to how long my check runs. I've also been thinking about writing a custom health check function that takes a list of services and checks them all with one call to systemctl if/when it does become an issue to call systemctl many times.

griznog

On Tue, Aug 27, 2024 at 3:06 PM Ken Dalenberg @.***> wrote:

The standard check: * || check_ps_service -u root -S sshd

Fails in redhat 9 when sshd service is enabled and running. Debug shows:

[1724788959] - DEBUG: Checking 67117: "sshd" vs. "sshd:" [1724788959] - DEBUG: Glob match check: sshd: does not match sshd

— Reply to this email directly, view it on GitHub https://github.com/mej/nhc/issues/151, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB4PKT7CWAQAKHCFYOJOQ3ZTTL4HAVCNFSM6AAAAABNG27VUCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ4TAMRSHAYTCMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/mej/nhc/issues/151#issuecomment-2336716429, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AZOSS7XYWASVEK4VD6OTRXDZVRQOLAVCNFSM6AAAAABNG27VUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZWG4YTMNBSHE. You are receiving this because you authored the thread.Message ID: @.***>

mej commented 6 days ago

I've started doing all my service checks via systemctl, e.g.

 * || check_cmd_output -t 2 -r 0 /usr/bin/systemctl is-active sssd

Having a dozen of these doesn't seem to make a meaningful difference to how long my check runs. I've also been thinking about writing a custom health check function that takes a list of services and checks them all with one call to systemctl if/when it does become an issue to call systemctl many times.

As I'm sure you remember, when check_ps_service() was originally written, SystemD was relatively new, and NHC needed to support systems all the way back to RHEL/CentOS/SL 4.x. Since the traditional LSB /sbin/service utility supported both SystemD and /etc/init.d/ scripts, that seemed the most straightforward approach. Fast-forward to today, and all "officially supported" platforms for the upcoming 1.5 release of NHC use SystemD. So making the move to systemctl might be prudent.

There's a lot I really love about SystemD, and there's a lot about it that drives me bonkers. But the quantity and usefulness of the verbs supported by systemctl is fantastic IMHO. I think there's a lot that could be done -- either in check_ps_service() or an entirely new check -- to take advantage of systemctl's consistency and feature set. I've already committed to some new features for it, as it's one of the most broadly used and most impactful checks in NHC's arsenal, but I'm keeping an open mind to the possibility that the sanest course of action may wind up being an entirely new check.

Regardless, if you do happen to put together a custom check for systemctl and multiple simultaneous unit validations, I hope you'll submit a PR! 😀

This is the string that got things working ok in redhat 9:

check_ps_service -u root -d sshd: -S sshd

Great! Glad you got it working. Just something to keep in mind: -d sshd: is exactly equivalent to -m '*sshd:', and in most cases that's the right choice; using the -m option directly merely gives greater control over exactly which process names will/won't be matched. (For example, my 2nd suggestion above uses a regular expression in order to match the sshd process with or without the trailing :.)