ray-project / kuberay

A toolkit to run Ray applications on Kubernetes
Apache License 2.0
1.28k stars 409 forks source link

[Feature] head pod and worker pods are prepared sequentially #2398

Open HCharlie opened 1 month ago

HCharlie commented 1 month ago

Search before asking

Description

Hi team,

I noticed that the raycluster head pod and worker pods are prepared sequentially, which involves provisioning node for head pod and pulling images, once the head pod is ready and in running status, then the worker pods could start to find a node and pull the image afterwards, this sequential behavior doubles the time users have to wait, Is there a way to make the sequential behavior parallel to create a better UX?

When doing this on cloud for example AWS, it might take more than 20 minutes for a fresh start depending on the instance type and image to use.

Use case

make the head pods and worker pods provisioning nodes and pull images parallelly

Related issues

No response

Are you willing to submit a PR?

andrewsykim commented 1 month ago

I noticed that the raycluster head pod and worker pods are prepared sequentially, which involves provisioning node for head pod and pulling images, once the head pod is ready and in running status, then the worker pods could start to find a node and pull the image afterwards

I don't think this is the current behavior. KubeRay creates the pod sequentially but it doesn't wait for the head pod to become ready before creating the worker pods.

Here's a simple test I just ran:

$ kind create cluster
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.27.3) đŸ–ŧ
 ✓ Preparing nodes đŸ“Ļ
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹ī¸
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Thanks for using kind! 😊
$ helm install kuberay-operator kuberay/kuberay-operator --version 1.2.1
NAME: kuberay-operator
LAST DEPLOYED: Tue Sep 24 14:13:39 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
$ kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/ray-cluster.complete.yaml
raycluster.ray.io/raycluster-complete created

$ kubectl get po
NAME                                           READY   STATUS              RESTARTS   AGE
kuberay-operator-84fb78dcfd-66bzx              1/1     Running             0          20s
raycluster-complete-head-jl8xp                 0/1     ContainerCreating   0          2s
raycluster-complete-small-group-worker-xnhlr   0/1     Init:0/1            0          1s
andrewsykim commented 1 month ago

What version of KubeRay are you using?

HCharlie commented 1 month ago

Hi @andrewsykim , thanks for the reply, the kuberay-operator I am using should be v1.1.0, the image tag is 40a946a. Maybe my words are not precise. What I notice is only once the head pod is in the running status, the worker pods start to find instances and pulling the images, before the head pod is in the running status, the worker pods just stay at the pending status. Is there a way to parallel these for head pod and worker pods? Or this is not the case and I observe something wrong, and there's some configuration needed?

my setup is to have several EC2 instances provisioned by Karpenter, for each of them there's only one worker pod take up almost all the resources(cpu, gpu, memory).

Weird, I managed to run the example you shared locally on my Macbook, it seems pretty fast to spin up both the head pod and worker pod.

HCharlie commented 1 month ago

I think you are right, I checked again, I notice the instances created for the head and worker pods are initialized together in the AWS console, maybe the instance type difference for pulling the image give me the wrong impression things are done sequentially. Thanks again.

andrewsykim commented 1 month ago

Your worker nodes are likely using GPUs and larger instance types that might take longer to scale up and initialize. That could explain the later start-up time for your worker pods compared to the head pod. Usually head pod is CPU only and can run on standard instance types

HCharlie commented 1 month ago

that's exactly the case.