imporve cluster health check logic

tedli commented 11 months ago

What would you like to be added:

Configurable cluster health check plugin, to check member cluster could run workloads or not.

Why is this needed:

Current impl only check /healthz of apiserver, which nearly always returns healthy:

https://github.com/kubernetes/apiserver/blob/6e796557fa8a46ec48109166f6db572bbda87267/pkg/server/config.go#L363-L364

https://github.com/karmada-io/karmada/blob/dff7a5b7881ea9f101803ffab04f6f7d0e93bd29/pkg/controllers/status/cluster_status_controller.go#L383-L388

But, apiserver is healthy dosen't represent the cluster is healthy, it's may be all nodes not ready, etc. , pods will never run.

So we'd better add the ability to configure how to check a member cluster, since the cluster may in different distribution (Azure, AWS, Huawei, etc.), at least we need to check node status of cluster, to avoid all nodes not ready (due to cni failed, etc.)

RainbowMango commented 10 months ago

First, Karmada now using /readyz to detect the cluster healthy, and will back off to healthz in case of no response from /readyz. I remember /readyz will be several checks. I'll investigate it and try to list all the checks later.

So we'd better add the ability to configure how to check a member cluster, since the cluster may in different distribution (Azure, AWS, Huawei, etc.),

Do you mean the /readyz or healthz implementation is different between these distributions?

jwcesign commented 10 months ago

Some users I meet also asked this question, /readyz and /healthz are not enough to detect the cluster status.

Like the network from Karmada to member clusters is not ok, but the cluster works ok on the cloud provider. So maybe the failover is not necessary.

They may need some custom way to define whether the clusters are healthy.

tedli commented 10 months ago

UPDATE: incorrect, for reference only ++++++++++++++++++++++++

Hi all,

Thanks for reply.

Hi @RainbowMango , I found how apiserver handle /readyz and /healthz: https://github.com/kubernetes/apiserver/blob/6e796557fa8a46ec48109166f6db572bbda87267/pkg/server/config.go#L363-L403 ~~It seems both readyz and healthz act the same, and nearly always returns ok~~, if the connectivity to apiserver is not broken. (I didn't set a break point to debug through to make sure it's exactly match my thought. I may got it wrong.)

And I didn't express well, I didn't mean readyz, healthz varies between different distributions. What I want to point out, is that, if I didn't got wrong, the code apiserver handles readyz, healthz is just as I pined out above. Then it's not enough to tell a cluster is health or ready only by healthz and readyz. Then extra checks is necessary, but how to check, the action varies, like AWS EKS, it needs various complex role or rolebing, which not likely could have a common check logic to cover all distribution (intree karmada builtin). So how to check a cluster's healthness, is on the user side, the user should give a /healthz liked probe, and set to Cluster CR, then cluster controller checks if this is set, then call this /healthz, it's configurable and extendible, what's more, karmada don't need extra RBACs, roles.

RainbowMango commented 10 months ago

Sorry for the slow response, I just got time to dig into this issue.

It seems both readyz and healthz act the same, and nearly always returns ok, if the connectivity to apiserver is not broken.

Given /healthz has been deprecated since Kubernetes v1.16, my investigation only focuses on /readyz. /readyz not only checks connection but with other checks, you can get all available checks using the verbose parameter, like:

# curl -k https://172.18.0.3:6443/readyz?verbose
[+]ping ok
[+]log ok
[+]etcd ok
[+]etcd-readiness ok
[+]informer-sync ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/storage-object-count-tracker-hook ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-apiserver-identity-lease-controller ok
[+]poststarthook/start-kube-apiserver-identity-lease-garbage-collector ok
[+]poststarthook/start-legacy-token-tracking-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]poststarthook/apiservice-openapiv3-controller ok
[+]shutdown ok
readyz check passed

From the output above, you can see there are several checks.

By the way, the ReadyzChecks are default checks and initialized with PingHealthz and LogHealthz, more checks will be appended during kube-apiserver startup, for example: here would append etcd-readiness check.

RainbowMango commented 10 months ago

The reason why I insist on investigating what exactly checks have been done by /readyz is I want to figure out the gap between what we have and user expectations. That will guide us to work out new proposals (solutions).

After going through the discussion on #3616, I tend to the idea mentioned by @Poor12 on https://github.com/karmada-io/karmada/pull/3616#issuecomment-1590400173 that we can have a field on Cluster object to clearly note the extension. But I think we need more declaration on that to address questions like:

What's the relationship between the livenessProbe(take the name as a place holder) and current cluster status reconcile logic? I mean should the liveness probe bypass the current logic or it is just a supplement?
What's the input/output of the liveness probe?
Should each cluster have a specific liveness probe endpoint?
What if the liveness probe can't be reachable? Should we treat that as cluster failure or backoff to the current status reconcile logic?
What exactly checks do we need to extend?
Is there any difference between Push and Pull mode clusters to react with the liveness probe?
What's the best practice or recommendation to users to implement and deploy the liveness probe service?
How can we test the feature? Should we implement an example probe service?

We may have more questions in addition to the issues mentioned above. I believe that this is not a simple feature that can be finished through just one PR. We need to conduct a detailed analysis and I suggest creating a proposal to address the thoughts on these issues.

RainbowMango commented 10 months ago

@tedli @Poor12 @jwcesign @chaunceyjiang any comments? I think this feature might be an important extension capability for Karmada, it might be used to enhance the cluster failover functionality.

Poor12 commented 10 months ago

I'd like to share my opinion about the above several questions:

I think the livenessProbe is the enhancement of original health check. It's a process of calling the chain. If it passes the original health checker, it may need to pass the custom health checker.
From the perspective of use cases, input may be the address of a service, output may be a bool result which determines whether it is healthy or not.
My answer is yes. If this probe is more to detect the interface of cloud vendors, the interface of different cloud vendors may be different.
From the doc of kubernetes, readyz is a comprehensive check for native kubernetes. We may not need to extend the exact checker.
I wonder if probe can be deployed in in-cluster or karmada control plane. It may differ if users adopt different deployment mode.
It may be great if we provide an example probe service.

tedli commented 10 months ago

Introducing a new livenessProbe liked field is good. (I used annotations, is because I'm sure it's a common scenario)

I have a crude impl, already running for a week. My impl, checks at least 1 node is ready, at least 1 non host networked pod is ready of a member cluster to tell a member cluster is ready.

I think something like my impl, these basic checks, can be impl in tree, users can just config n not 1 node, pod.

If no such in tree basic checks, it's far more complex to config a livenessProbe, which may let some first time user give up karmada.

RainbowMango commented 8 months ago

I'm trying to figure out which PR/Issue should be included in the coming v1.7 release which is planned at the end of this month. I guess we don't have enough time for this feature, so I'm moving this to v1.8.

karmada-io / karmada

imporve cluster health check logic #3612