ray-project / kuberay

A toolkit to run Ray applications on Kubernetes
Apache License 2.0
1.01k stars 341 forks source link

support using proxy subresources when connecting to Ray head node #1980

Closed andrewsykim closed 3 months ago

andrewsykim commented 4 months ago

Why are these changes needed?

There are some cases where Kuberay may not be able to directly connect to a Ray head node. For example, there might be a NetworkPolicy disallowing ingress from all Pods or KubeRay is running on a network with no connectivity to Pods. This PR allows Kuberay to use the services/proxy subresource to proxy HTTP requests to the Ray head node. This allows Kuberay to make requests to the head node without every connecting to it directly.

Here are some sample HTTP requests in apiserver from my testing using the proxy subresource:

I0310 14:30:19.596708       1 httplog.go:131] "HTTP" verb="GET" URI="/api/v1/namespaces/default/services/rayjob-sample-raycluster-phsj9-head-svc:dashboard/proxy/api/jobs/rayjob-sample-th44t" latency="7.239221ms" userAgent="manager/v0.0.0 (linux/amd64) kubernetes/$Format" audit-ID="c40152de-7a81-46b9-ac4a-f2ea296e44f0" srcIP="10.244.0.6:49524" resp=200

I0310 15:20:43.105571       1 httplog.go:131] "HTTP" verb="GET" URI="/api/v1/namespaces/default/pods/rayservice-sample-raycluster-qm2m2-head-xj44d:8000/proxy/-/healthz" latency="2.915789ms" userAgent="manager/v0.0.0 (linux/amd64) kubernetes/$Format" audit-ID="bad00c23-de01-45ed-9fb5-05bc8b2f6c2d" srcIP="10.244.0.6:47274" resp=200

I0310 15:21:07.446881       1 httplog.go:131] "HTTP" verb="GET" URI="/api/v1/namespaces/default/services/rayservice-sample-raycluster-qm2m2-head-svc:dashboard/proxy/api/serve/applications/" latency="15.347398ms" userAgent="manager/v0.0.0 (linux/amd64) kubernetes/$Format" audit-ID="54544346-b827-4f34-b593-bcbc78199611" srcIP="10.244.0.6:47274" resp=200

Checks

kevin85421 commented 3 months ago

I tested this PR manually.

andrewsykim commented 3 months ago

thanks @kevin85421! I'll look into adding an e2e test as well