Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
Currently as I believe, the only way to programatically check the status of a driver on a Nomad client is to process the /v1/node/:node_id API endpoint. In situations where a driver fails, but the cluster has capacity to place the workload on another node, it is possible the driver failure could go unnoticed.
It would be helpful if there was an easier way to monitor the health of a Nomad client node driver, which could in-turn be integrated into an alerting system. A potential thought on this could be to register the detected drivers in Consul as a health check under the Nomad client catalog entry. The health check could be updated as the driver health changes, allowing for easier operation and better observability of cluster issues.
Currently as I believe, the only way to programatically check the status of a driver on a Nomad client is to process the
/v1/node/:node_id
API endpoint. In situations where a driver fails, but the cluster has capacity to place the workload on another node, it is possible the driver failure could go unnoticed.It would be helpful if there was an easier way to monitor the health of a Nomad client node driver, which could in-turn be integrated into an alerting system. A potential thought on this could be to register the detected drivers in Consul as a health check under the Nomad client catalog entry. The health check could be updated as the driver health changes, allowing for easier operation and better observability of cluster issues.
cc @stevenscg