syself / cluster-api-provider-hetzner

Cluster API Provider Hetzner :rocket: The best way to manage Kubernetes clusters on Hetzner, fully declarative, Kubernetes-native and with self-healing capabilities
https://caph.syself.com
Apache License 2.0
678 stars 60 forks source link

support http healthcheck for api lb #1339

Closed simonostendorf closed 3 months ago

simonostendorf commented 5 months ago

/kind feature

Describe the solution you'd like I would like to use a http healthcheck for the api lb. This prevents the healthcheck to be ready before the apiserver itself is ready, because the port is opened before the apiserver is ready (and tcp healthcheck will be ready before apiserver is ready).

Anything else you would like to add: Maybe this could be added like this:

spec:
  controlPlaneLoadBalancer:
    healthcheck:
      type: http # (or tcp / https (http with tls enabled))
      interval: 15s
      timeout: 10s
      retries: 3
      domain: ""
      path: "/readyz"

if omitted use the hetzner default values (tcp, 15s interval, 10s timeout, 3 retries)

guettli commented 5 months ago

@simonostendorf please elaborate a bit more in which use-case you encountered the error. What kind of cluster did you create, how many control-planes?

Do you have logs/events/conditions where the error is visible?

You propose to extend the CRD HetznerCluster, is that correct?

simonostendorf commented 5 months ago

You propose to extend the CRD HetznerCluster, is that correct?

Yes.

please elaborate a bit more in which use-case you encountered the error. What kind of cluster did you create, how many control-planes?

I didn't get any errors as far as I can remember but I heard that other clusters (e.g. openstack) have problems at cluster updates where the api is unavailable because the lb adds the nodes before the api server is ready and I think to prevent errors like this and to use all the features provided by the hcloud lb, this should be added to the HetznerCluster as its a better check than tcp. I usually use 3 hcloud nodes as control plane.

janiskemper commented 5 months ago

I heard that other clusters (e.g. openstack) have problems at cluster updates where the api is unavailable because the lb adds the nodes before the api server is ready

@simonostendorf I hope that would be a problem here ;)

https://github.com/syself/cluster-api-provider-hetzner/blob/741b61f0fdadf2231819c7ce44add204111150ff/pkg/services/hcloud/server/server.go#L296

simonostendorf commented 5 months ago

But what do you do if the apiserver becomes unhealthy later and the port is still reachable via tcp?

The initial problem is solved. I didn't check the code, sorry :)

guettli commented 5 months ago

@simonostendorf I don't have all the details in my mind. But overall the code is readable. Could you please have a look at the code?

If you don't understand the code, then please tell us! The code should be readable, if comments are missing, we want to add them.

Is that ok for you?

janiskemper commented 5 months ago

@guettli what would be your answer to this question here?

But what do you do if the apiserver becomes unhealthy later and the port is still reachable via tcp?

batistein commented 3 months ago

If this is the case, you should usually have a handler in your stack to detect this and mark the node as unhealthy. Then you can configure machine health checks or a custom remediation strategy to fix this problem.

Usually the port will be unavailable and then the LB healthcheck will see this as well and stop routing traffic to that node. So this would be a very very rare edge that would only happen if the port is still available and for the time until your stack detects the problem and only then it will affect 1/3 or 1/5 of the connections for a small period of time.

When using Syself Autopilot: You would be affected on 1/5 of the connections for about 3 minutes. Actually I never saw a hanging port on the kube-apiserver that was not detected by the loadbalancer health check, so in the real world I guess it would be lower than 30s. We also use a reboot strategy, so the node will probably be back up in less than a minute in most cases

guettli commented 3 months ago

@simonostendorf we have not heard from you since some weeks. Were you able to solve your issue? Please tell us!

I will close the issue now. Please re-open it or create a new issue, if something is not working or not easy to understand - thank you!

simonostendorf commented 3 months ago

@simonostendorf we have not heard from you since some weeks. Were you able to solve your issue? Please tell us!

I will close the issue now. Please re-open it or create a new issue, if something is not working or not easy to understand - thank you!

Sorry for the late reply. I haven't got the time to test the lbaas failure behavior. I will reopen this issue if I have any problems with that.

simonostendorf commented 1 month ago

@guettli

I tested the feature now: the node is added to the LB after it is marked "ready" or if it is the first node (this is the result of my test. I haven't checked if this is intended and correct).

This works for adding the node to the LB.

But I could not figure out what happens if the api server of, say, node-3 goes down from time to time. A http healthcheck would catch this and remove the node from load balancing. The caph controller does not seem to do this. So maybe supporting http healthchecks is still a good idea, but not as important as first thought.

janiskemper commented 1 month ago

Your observations are correct @simonostendorf . We do have a check like this in the code that you have the exact behavior you describe. The first node has to be added and after that we wait until that node is actually reachable. No comment on the healthcheck from my side though :sweat_smile:

guettli commented 1 month ago

@simonostendorf

But I could not figure out what happens if the api server of, say, node-3 goes down from time to time. A http healthcheck would catch this and remove the node from load balancing.

I am unsure about which component is responsible for that. Maybe this is already handled at a higher level (cluster-api, kubernetes). Can you please check how other cluster-api infra providers do that?

simonostendorf commented 1 month ago

@simonostendorf

But I could not figure out what happens if the api server of, say, node-3 goes down from time to time. A http healthcheck would catch this and remove the node from load balancing.

I am unsure about which component is responsible for that. Maybe this is already handled at a higher level (cluster-api, kubernetes). Can you please check how other cluster-api infra providers do that?

I don't think that's an infra provider responsibility. I just wanted to clarify why supporting http healthchecks is still a good idea but not as necessary as initially thought.