I was excited to see this project after seeing https://github.com/netbox-community/netbox/issues/8831. However I was a bit surprised that the health checking view/url is returning a full webpage intended for humans (which seems to not be the intention of the aforementioned original issue).
I think the human status page is a great addition but for health checking from health systems for things like load balancers a much more simplistic health check should be done.
I propose that an additional /_health (or similar endpoint) be added with a simple text response of OK (with a status code of 200) or NOT OK (with a status code of 500). Additional statuses could be added if desired for thinks like DEGRADED (however I fail to see where this could be useful).
Simplest implementation is just to return OK as a string and then later enhancements can be added to include DB and REDIS connection stats and even look at counters of failed responses, etc, but I recommend starting easy unless there is something that is easy to implement to give better signal.
Use case
The intention is to be able to return a small payload to determine the health of a system. Multi-kilobyte payloads (like one in a fully templated page) are not great for automated systems looking for the health of a system.
The longer it takes to return a response the more delayed and less useful the health check is for determining a healthy node.
A lot of these systems store the response and don't scale well when the response is larger (i.e templated HTML)
Multi-instance netbox deployments behind a load balancer can route away from unhealthy nodes.
During a deployment with k8s or other systems the health check endpoint is used to determine if the deployment was successful or not. If not they are automatically rolled back. This means deployments can be automated.
Although the existing "pretty" page at /healthcheck works for this use-case, it will not work for my environment where health check response size is important.
NetBox HealthCheck Plugin version
Latest?
NetBox version
3.0.12
Feature type
Change to existing functionality
Proposed functionality
I was excited to see this project after seeing https://github.com/netbox-community/netbox/issues/8831. However I was a bit surprised that the health checking view/url is returning a full webpage intended for humans (which seems to not be the intention of the aforementioned original issue).
I think the human status page is a great addition but for health checking from health systems for things like load balancers a much more simplistic health check should be done.
I propose that an additional /_health (or similar endpoint) be added with a simple text response of OK (with a status code of 200) or NOT OK (with a status code of 500). Additional statuses could be added if desired for thinks like DEGRADED (however I fail to see where this could be useful).
Simplest implementation is just to return OK as a string and then later enhancements can be added to include DB and REDIS connection stats and even look at counters of failed responses, etc, but I recommend starting easy unless there is something that is easy to implement to give better signal.
Use case
The intention is to be able to return a small payload to determine the health of a system. Multi-kilobyte payloads (like one in a fully templated page) are not great for automated systems looking for the health of a system.
This is used for the same scenarios listed in https://github.com/netbox-community/netbox/issues/8831.
Although the existing "pretty" page at /healthcheck works for this use-case, it will not work for my environment where health check response size is important.
External dependencies
No response