ray-project / kuberay

A toolkit to run Ray applications on Kubernetes
Apache License 2.0
963 stars 328 forks source link

[Bug] Fix RayCluster with an overridden app.kubernetes.io/name (#2147) #2166

Closed rueian closed 1 month ago

rueian commented 1 month ago

Why are these changes needed?

The current client.MatchingLabels(common.HeadServiceLabels(*instance)) includes the app.kubernetes.io/name and app.kubernetes.io/created-by labels with their default values to find the Head service of a RayCluster. These default values make the controller fail to find the Head service if they are overridden. We should not rely on labels that can be overridden by users to find the Head service.

This PR replaces the current behavior with a new association function RayClusterHeadServiceListOptions that only uses the following labels to find the Head service.

utils.RayClusterLabelKey:  instance.Name,
utils.RayNodeTypeLabelKey: string(rayv1.HeadNode),
utils.RayIDLabelKey:       utils.CheckLabel(utils.GenerateIdentifier(instance.Name, rayv1.HeadNode)),

The PR also adds a new e2e test ray-job.custom-k8s-app.yaml showing that it works and fixes the #2147.

image

Related issue number

2147

Checks

rueian commented 1 month ago

Running an e2e test seems a bit overkill for me. Perhaps we could use an envtest instead?

Sure. The test has been moved to the envtest.