Open tsailiming opened 2 weeks ago
@astefanutti Filed this as per your request.
@tsailiming what's the KubeRay version? In previous versions it is a known isuse that RayCluster status indefinitly ready once it observes all worker pods as running. There's some discussion about it in https://github.com/ray-project/kuberay/pull/1930
From one of the head pod. This is from OpenShift AI 2.9.1.
$ ray --version
ray, version 2.7.1
@tsailiming I meant the KubeRay version, not the Ray version
Search before asking
KubeRay Component
apiserver
What happened + What you expected to happen
When there are pods stuck in Pending because of insufficient resources, the
RayCluster
state is reported asready
.This is the status from the head pod
Reproduction script
ClusterQueue
quota requirement so that it runs and not inSuspended
stateAnything else
No response
Are you willing to submit a PR?