knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.58k stars 1.16k forks source link

Activator health checks #15575

Open thorweijie opened 1 month ago

thorweijie commented 1 month ago

Ask your question here:

We have a kubernetes cluster with many inference services. After all the inference services were restarted, we noticed the istio-proxy container in activator pods were having high cpu usage and health checks were failing with response code 0, so we set target burst capacity to 0 to bypass the activator and fix the issue. We noticed that despite being skipped, the activator pods were still trying to perform health checks with response code 0 until they were restarted. We would like to know if the health checks for activator are cached, and whether the frequency of the health checks can be configured?

skonto commented 1 month ago

Hi @thorweijie!

After all the inference services were restarted, we noticed the istio-proxy container in activator pods were having high cpu usage and health checks were failing with response code 0

What healthchecks were failing, the activator ones?

We noticed that despite being skipped, the activator pods were still trying to perform health checks with response code 0 until they were restarted. We would like to know if the health checks for activator are cached, and whether the frequency of the health checks can be configured?

The probing mechanism is started when endpoints are created/updated with a default frequency of 200ms. If probing finished successfully you should see this msg assuming you enable activator debug logging:

{"severity":"DEBUG","timestamp":"2024-10-23T14:20:52.082125337Z","logger":"activator","caller":"net/revision_backends.go:348","message":"Done probing, got 1 healthy pods","commit":"0abee66","knative.dev/controller":"activator","knative.dev/pod":"activator-8675c9944c-mdfj9","knative.dev/key":"default/autoscale-go-00001"}

Once all pods are ready (and stay that way) probing should stop. The idea is that activator is in standby mode to handle traffic and so each activator instance needs to know ready targets so it can route traffic to them if needed. Afaik there is no caching. Maybe @ReToCode, @dprotaso have more to say here.