[Enhancement][opensearch] Semantic health probes

lieberlois commented 1 year ago

Is your feature request related to a problem? Please describe. Its frustrating that the OpenSearch API distinguishes between different cluster states (red/yellow/green) which makes liveness / readiness probes semantically useless during a rolling restart, since a second opensearch node is restarted, before all shards have been moved.

Describe the solution you'd like I want an endpoint that only responds OK when the cluster health is green (or maybe yellow). One idea would be to add a small sidecar container as an adapter that interprets the response from GET _cluster/health which looks like this:

{
  "cluster_name": "opensearch-cluster",
  "status": "yellow",
  ...
}

I would suggest to implement a small Go service since it can be written in very few LoC and is very lightweight. Alternatively we could use a curl command along the lines of GET _cluster/health?wait_for_status=green&timeout=1 since it times out when the cluster state is not green and hence fails the readiness probe check. For both cases we somehow need to add a user that is allowed to call the endpoint (or use a dedicated client certificate which probably complicates the setup).

Describe alternatives you've considered Using the Kubernetes Operator for OpenSearch is an option.

Additional context I contributed #329 and #333 and can also assist in the implementation of this if others also think this is a good idea.

smlx commented 1 year ago

There was an extensive discussion about this issue on this PR: https://github.com/opensearch-project/helm-charts/pull/172

Please take a look and see if the discussion covers this issue.

lieberlois commented 1 year ago

Thanks @smlx, I read through the discussion :) I dont think using client cetificates was evaluated in the discussion. I would suggest to have a client cert, for whose CN / DN only the GET _cluster/health is allowed. This way we should be able to achieve a secure passwordless health check. What do you think?

smlx commented 1 year ago

One problem with using the _cluster/health endpoint for a readiness probe is that if the cluster goes unhealthy for some reason then Kubernetes will stop sending traffic to any node. This makes it very difficult to recover the cluster, since you can't hit the API through the service endpoint.

lieberlois commented 1 year ago

@smlx Good Point, but we can fix this by only using a startupProbe, which only runs on the first start of the pod 🤔 I see why its a tricky topic, its just that we sometimes get a red cluster state because of the rolling restart behaviour (e.g. 3 nodes, node 1 restarted and not fully initialized yet, then node 2 also reboots -> Cluster Red)

smlx commented 1 year ago

we can fix this by only using a startupProbe

But in that case imagine that enough pods go offline for the cluster to go red (e.g. nodes crash). How will the replacement pods rejoin the cluster, since the replacement pods will wait on the startup probe for the cluster to go green but it never will?

lieberlois commented 12 months ago

Mhm good point, i thought about using this startup probe in our project:

CERTS="/usr/share/opensearch/config/certs/admin/"
curl --fail-with-body --cert "$CERTS/tls.crt" --key "$CERTS/tls.key" -k "https://localhost:9200/_cluster/health?wait_for_status=green&timeout=1s"

How do you guys restart your cluster (also @smlx)? In our case, the cluster will at some point go red, because K8s doesnt wait for the cluster to be green again after restarting a node, it will just restart the next sts pod.

lieberlois commented 12 months ago

Another idea, we could also add a preStop lifecycle hook that either drains the OpenSearch node or makes sure that the cluster health is green before restarting it. What do you think of that? Basically i suggest to add sane defaults for this preStart hook (or at least an example on how to do it) here:

https://github.com/opensearch-project/helm-charts/blob/main/charts/opensearch/values.yaml#L409

Edit: an idea would be to just provide it as an example on how sb might implement this.

smlx commented 12 months ago

Yeah that might work, I like the idea. There also might be something we could do as a pre-upgrade hook in the helm chart, maybe in combination with a pod preStart or preStop hook

Honestly speaking, in the interest of a speedy upgrade we just incur a short window of the cluster going red.

opensearch-project / helm-charts

[Enhancement][opensearch] Semantic health probes #474