ray-project / kuberay

A toolkit to run Ray applications on Kubernetes
Apache License 2.0
983 stars 330 forks source link

[Bug] RayService's Head Service does not inherit annotations from Head Pod's Service #1092

Open rjmco opened 1 year ago

rjmco commented 1 year ago

Search before asking

KubeRay Component

ray-operator

What happened + What you expected to happen

If a spec.rayClusterConfig.headServiceAnnotations value is declared on a RayService manifest, the "head" Service created by the RayCluster (e.g. rayservice-sample-raycluster-ffd2k-head-svc) will include those annotations, but the "head" Service created by the RayService (i.e. rayservice-sample-head-svc) will not.

I would have expected the "head" Service created by the RayService to have the same annotations because it should be a replica of the active RayCluster's "head" Service.

Reproduction script

  1. Apply a modified kuberay/ray-operator/config/samples/ray_v1alpha1_rayservice.yaml manifest by adding the spec.rayClusterConfig.headServiceAnnotations: {"test": "test"} value and applying. As the example shows:
kubectl apply -f - <<EOF
# Make sure to increase resource requests and limits before using this example in production.
# For examples with more realistic resource configuration, see
# ray-cluster.complete.large.yaml and
# ray-cluster.autoscaler.large.yaml.
apiVersion: ray.io/v1alpha1
kind: RayService
metadata:
  name: rayservice-sample
spec:
  serviceUnhealthySecondThreshold: 300 # Config for the health check threshold for service. Default value is 60.
  deploymentUnhealthySecondThreshold: 300 # Config for the health check threshold for deployments. Default value is 60.
  serveConfig:
    importPath: fruit.deployment_graph
    runtimeEnv: |
      working_dir: "https://github.com/ray-project/test_dag/archive/41d09119cbdf8450599f993f51318e9e27c59098.zip"
    deployments:
      - name: MangoStand
        numReplicas: 1
        userConfig: |
          price: 3
        rayActorOptions:
          numCpus: 0.1
      - name: OrangeStand
        numReplicas: 1
        userConfig: |
          price: 2
        rayActorOptions:
          numCpus: 0.1
      - name: PearStand
        numReplicas: 1
        userConfig: |
          price: 1
        rayActorOptions:
          numCpus: 0.1
      - name: FruitMarket
        numReplicas: 1
        rayActorOptions:
          numCpus: 0.1
      - name: DAGDriver
        numReplicas: 1
        routePrefix: "/"
        rayActorOptions:
          numCpus: 0.1
  rayClusterConfig:
    rayVersion: '2.4.0' # should match the Ray version in the image of the containers
    ######################headGroupSpecs#################################
    # Ray head pod template.
    headGroupSpec:
      # the following params are used to complete the ray start: ray start --head --block --redis-port=6379 ...
      rayStartParams:
        port: '6379' # should match container port named gcs-server
        dashboard-host: '0.0.0.0'
        num-cpus: '2' # can be auto-completed from the limits
      #pod template
      template:
        spec:
          containers:
            - name: ray-head
              image: rayproject/ray:2.4.0
              resources:
                limits:
                  cpu: 2
                  memory: 2Gi
                requests:
                  cpu: 2
                  memory: 2Gi
              ports:
                - containerPort: 6379
                  name: gcs-server
                - containerPort: 8265 # Ray dashboard
                  name: dashboard
                - containerPort: 10001
                  name: client
                - containerPort: 8000
                  name: serve
    workerGroupSpecs:
      # the pod replicas in this group typed worker
      - replicas: 1
        minReplicas: 1
        maxReplicas: 5
        # logical group name, for this called small-group, also can be functional
        groupName: small-group
        rayStartParams: {}
        #pod template
        template:
          spec:
            containers:
              - name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name',  or '123-abc'
                image: rayproject/ray:2.4.0
                lifecycle:
                  preStop:
                    exec:
                      command: ["/bin/sh","-c","ray stop"]
                resources:
                  limits:
                    cpu: "1"
                    memory: "2Gi"
                  requests:
                    cpu: "500m"
                    memory: "2Gi"
    headServiceAnnotations:
      test: test
EOF
  1. Verify that the annotations are missing from the "head" Service created by the RayService (i.e. rayservice-sample-head-svc) but are present on the "head" Service created by the RayCluster (i.e. rayservice-sample-raycluster-ffd2k-head-svc):
$ kubectl get svc -l ray.io/node-type=head -o custom-columns='NAME:metadata.name,ANNOTATIONS:metadata.annotations'
NAME                                          ANNOTATIONS
rayservice-sample-head-svc                    <none>
rayservice-sample-raycluster-ffd2k-head-svc   map[test:test]

Anything else

The ability to add annotations to services and change other fields on the Service's spec is important to take full advantage of some cloud provider's load balancer features. Particularly in my case, I am aiming to use the annotation cloud.google.com/backend-config:{"default": "head-svc-backend-config"} to configure the GCP load balancer to use HTTP health checks and restrict access to the Ray dashboard endpoint with an authentication portal using Identity-Aware Proxy.

I am willing to submit a PR to fix the behavior to match my expectations. In fact, I already have a working fix on https://github.com/ray-project/kuberay/compare/master...rjmco:kuberay:0-5-0-fix-lack-of-annotations-on-rayservice-service to which I can add unit test and submit.

However, after searching the issues before submitting this one, I found that @kevin85421 and @architkulkarni are working #877 and #1040 which will completely change the CRD's interface on the next version and I also am not sure if the teams agrees with the expected behavior that I described above.

Does the team agree that the "head" Service created by the RayService should be behave in the way that I described above, or should there be an independent field in the CRD to manage what annotations the "head" Service created by the RayService should contain?

Are you willing to submit a PR?

architkulkarni commented 1 year ago

My thought is that the head Service should indeed behave the way you describe. I think your fix makes sense, but you're right that after the PR #1040, the recommended way will be to set the annotations directly in spec.headGroupSpec.headService, specifically spec.headGroupSpec.headService.metadata.anotations, and these annotations should also be inherited by the RayService. So I think there should be no need to submit your fix as a PR.

@kevin85421 can comment if I've said anything wrong here.