ray-project / kuberay

A toolkit to run Ray applications on Kubernetes
Apache License 2.0
982 stars 330 forks source link

[Hotfix] Increase the timeout of the ProxyActor health check #2082

Closed kevin85421 closed 2 months ago

kevin85421 commented 2 months ago

Why are these changes needed?

I observed that NumServeEndpoints changes frequently especially after we start to watch Endpoints in #2080. The error message is:

Get \"http://10.244.0.6:8000/-/healthz\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

The timeout of the HTTP client is 20 ms. Hence, I increase the timeout to 2 seconds which is the same as the dashboard HTTP client.

I marked it as 'Hotfix' because I think 20 ms should be enough for my very simple setup (single Ray node, local Kind cluster, no requests). Hence, the instability may be a Ray Serve issue.

Related issue number

Checks