proposal: plugin system for offloading nomad service discovery health checks

Nomad 1.4 added support for health checks of services in Nomad's built-in service discovery. However, the resulting data of those health checks is only ever stored on individual Nomad clients. This is unlike Consul which replicates that data up to the Consul control plane and stores that data in the Consul raft store. Consul can then make the health check data query-able through its API.

The Nomad use case has so far been specific to managing deployment roll-outs, where making the results of health checks available for consumption hasn't been necessary. It would be a nice feature to add though, as it would open up many other use cases.

However I think we should deviate from the architecture Consul went with, for two reasons.

The first is the "cliff" problem during cluster-wide outages. In a steady healthy state, the control plane is only receiving minimal updates from agents as repeated healthy health check results are discarded. In a cluster outage scenario, all those checks may transition to unhealthy and suddenly ramp up the amount of updates being sent to the control plane, crushing the raft state store with all this new data, right at the worst possible time.

The second is that we don't want to encourage the use of Nomad's API in the critical path of business logic. By exposing health check data we open the door to consumers of that data for whatever purpose, and Nomad is probably not the right type of data store for many of those consumers.

Instead, what if Nomad added a new plugin type for offloading health check data. The plugin interface could be implemented to push the health check data into more suitable data stores like redis, mysql, or into a message queue like kafka or pubsub. In this scenario consumers could efficiently fine-tune the storage and access patterns to their own infrastructure, while not adding any burden at all to the Nomad control plane.

One could imagine the plugin RPC interface would be something simple, like

Publish(CheckResult)

We could of course provide a reference implementation (probably for redis), and a production ready Nomad pack to deploy it.

I'm liking this idea and appreciate the separation of concerns between business logic and the raft state store.

One of the things that was nice about our prior approach (polling /health endpoints) is that health checks were treated as a "sub-resource" of the service (or allocation)

With this, we could use Redis streams generally (or, better for our use case, a hashmap that is idempotently updated with the JSON blob from our health check), but it would be up to us to correlate the broader job state with the state of Redis. For example, if we stop a job, we'd need to signal (or TTL delete) the Redis entry.

Also separately but related, I assume the allocation ID (or other alloc info) can be passed as part of the RPC interface?

Jumping back a level because I want to make sure I'm not continuing to push in a bad direction- is it appropriate to query Nomad allocations and reflect that data in Redis (e.g. to be consumed by a matchmaking service) or should this be entirely out of bounds (tasks executables should report their state directly to redis bypassing Nomad altogether).

As a mental model I'm viewing game matchmaking as "service discovery++" and was hoping to use Nomad as the "base layer" (what instances exist for what games) and mirror that data to Redis for fast sorted operations on player lists per allocation etc...

hashicorp / nomad

proposal: plugin system for offloading nomad service discovery health checks #18361