mej / nhc

LBNL Node Health Check
Other
213 stars 78 forks source link

Using something like an 'include' directive? or better with templates #138

Open jbaksta opened 1 year ago

jbaksta commented 1 year ago

I have a odd need to run probably 3 (2 at least) different configs for NHC on a node (at different points in their boot / triage cycles). Many of the checks are the same between them all and maybe this is better for a templating engine, but I'd prefer to keep the configs in one place from the templating engine we already have and then do an include. An example might be:

production NHC

include /etc/nhc/pre_cfg_checks
include /etc/nhc/post_cfg_checks
include /etc/nhc/prod_checks

With a precursor run to cfg checks just be

include /etc/nhc/pre_cfg_checks

Again, this may be more elegant in a templating engine like ERB or Jinja2. Just posing a thought I guess. I could also be missing something totally obvious as well.

mej commented 11 months ago

I have thought about this some in the past, so I'm curious to hear your thoughts on my thoughts!

So since NHC is Bash, there are ways that we can take advantage of some (potentially obscure?) Bash features to accomplish this task already. For example, there is a command-line option, -c, to which we can pass an arbitrary filename to be used as the configuration for NHC. To combine 3 configs into 1 for a given run, we can use something called Process Substitution to provide a dynamic "config file" to NHC, like this:

nhc -c <(cat /etc/nhc/{pre_cfg,post_cfg,prod}_checks) ...

Bash replaces that expression with a special file path referencing a particular file descriptor, so nhc ends up seeing something like nhc -c /dev/fd/63, and the shell that launches this nhc will execute whatever command(s) are provided (the <( ...<command(s)>... ) expression) and direct the output of that pipeline/list to the nhc child process on the given file descriptor (in this case 63).

There are other similar "tricks" we can use...but you get the idea. 😄

Here at LANL, though, we currently use Ansible for doing config management, and our NHC role supplies the desired checks for each context as a named YAML list and a Jinja2 template that iterates over them. If you already have a templating engine as part of your existing CM infrastructure, using that is probably the sanest option.

Having said all that, if there's sufficient interest, adding an "include" feature to NHC itself should be relatively simple -- have nhc_load_conf() look for the include directive and call itself recursively with the specified path/filename being include-ed.

Any thoughts?

jbaksta commented 11 months ago

I also thought about the use of process substitution, but was avoiding that with systemd and wasn't planning on prefixing the execution with bash -c but I suppose that'd be trivial to change.

I've also started just implementing it with Ansible and the jinja2 templating which allowed me to be a little more efficient / lazy about writing the config honestly. I think that's probably better off that way for the most part at our site for now. Would be curious how many other sites are using templating engine though or vice versa where having the include statement be valuable.

I'll have to glance at the nhc_load_conf() function. More or less seeing if others would find an include statement worthwhile. There'd be some additional error checking or at least warning messages that I would think need to be there (i.e., File Not Found) and continue on or be fatal.