Closed bixu closed 2 years ago
~The Refinery API only has the /alive
endpoint. The docs should be updated to remove the /x/alive
endpoint.~
The /x/alive
endpoint is proxied to the Honeycomb API, so allows verification on whether the Refinery cluster can communicate with the Honeycomb service.
I'm unsure if it's a good idea to use a proxy process as the verification process for whether a node is considered available. Refinery nodes are designed to recover from intermittent network outages.
For now, I think using the /x/alive
endpoint is not a good idea. We have seen refinery struggle to cope with irregular cluster topology changes and this could exacerbate the problem. Plus, in the case there was a Honeycomb API outage, we wouldn't want a refinery cluster to take itself down - the cluster nodes should be stable and utilise other tools (eg retries and memory limiting) to protect itself until it can deliver telemetry to Honeycomb.
https://github.com/honeycombio/helm-charts/blob/555231a5ffaf59c815397007a48726f434b81132/charts/refinery/templates/deployment.yaml#L84
I'm comparing the line above to the docs: https://docs.honeycomb.io/manage-data-volume/refinery/scale-and-troubleshoot/#xalive