Closed BeerTai closed 4 hours ago
Which KubeRay version are you using? If you’re using KubeRay v1.2.2, you can run kubectl describe raycluster $YOUR_RAYCLUSTER
and check the Kubernetes events to see why the pods failed to create.
Which KubeRay version are you using? If you’re using KubeRay v1.2.2, you can run
kubectl describe raycluster $YOUR_RAYCLUSTER
and check the Kubernetes events to see why the pods failed to create.
I run kubectl describe raycluster, and get
Status:
Desired CPU: 10
Desired GPU: 8
Desired Memory: 520Gi
Desired TPU: 0
Desired Worker Replicas: 1
Endpoints:
Client: 10001
Dashboard: 8265
Gcs: 6379
Metrics: 8080
Head:
Pod IP: 10.233.105.84
Pod Name: raycluster-complete-head-x8x26
Service IP: 10.233.105.84
Service Name: raycluster-complete-head-svc
Last Update Time: 2024-11-12T01:17:32Z
Max Worker Replicas: 4
Min Worker Replicas: 1
Observed Generation: 2
State: ready
State Transition Times:
Ready: 2024-11-11T08:45:40Z
Events: <none>
Search before asking
KubeRay Component
ray-operator
What happened + What you expected to happen
When I use nvidia/gpu in the yaml file, the worker's pod can not be created, but with raw cpu resources, the pod is created correctly
Reproduction script
Anything else
only head pod
Are you willing to submit a PR?