[Bug] rayservice deployment non-executable issues

park12sj commented 1 year ago

Search before asking

[X] I searched the issues and found no similar issues.

KubeRay Component

Others

What happened + What you expected to happen

When deploying Rayservice, servereplica does not run even though sufficient resources are available.
- One head pod and two worker pods set to cpu limit 2 were distributed, and the total cpu limit of ray cluster is 6.
- In fact, according to the controller log, there is a log with a cpu available resource of 5.6
- However, when rayActorConfig is set to cpulimit 3, serviceReplica is not executed.

WARNING 2023-05-13 23:26:26,277 controller 162 deployment_state.py:1613 - Deployment "MangoStand" has 1 replicas that have taken more than 30s to be scheduled. This may be caused by waiting for the cluster to auto-scale, or waiting for a runtime environment to install. Resources required for each replica: {"CPU": 3.0}, resources available: {"CPU": 5.6}.
WARNING 2023-05-13 23:26:56,365 controller 162 deployment_state.py:1613 - Deployment "MangoStand" has 1 replicas that have taken more than 30s to be scheduled. This may be caused by waiting for the cluster to auto-scale, or waiting for a runtime environment to install. Resources required for each replica: {"CPU": 3.0}, resources available: {"CPU": 5.6}.
WARNING 2023-05-13 23:27:26,473 controller 162 deployment_state.py:1613 - Deployment "MangoStand" has 1 replicas that have taken more than 30s to be scheduled. This may be caused by waiting for the cluster to auto-scale, or waiting for a runtime environment to install. Resources required for each replica: {"CPU": 3.0}, resources available: {"CPU": 5.6}.
WARNING 2023-05-13 23:27:56,504 controller 162 deployment_state.py:1613 - Deployment "MangoStand" has 1 replicas that have taken more than 30s to be scheduled. This may be caused by waiting for the cluster to auto-scale, or waiting for a runtime environment to install. Resources required for each replica: {"CPU": 3.0}, resources available: {"CPU": 5.6}.

Reproduction script

# Make sure to increase resource requests and limits before using this example in production.
# For examples with more realistic resource configuration, see
# ray-cluster.complete.large.yaml and
# ray-cluster.autoscaler.large.yaml.
apiVersion: ray.io/v1alpha1
kind: RayService
metadata:
  name: rayservice-sample
spec:
  serviceUnhealthySecondThreshold: 300 # Config for the health check threshold for service. Default value is 60.
  deploymentUnhealthySecondThreshold: 300 # Config for the health check threshold for deployments. Default value is 60.
  serveConfig:
    importPath: fruit.deployment_graph
    runtimeEnv: |
      working_dir: "https://github.com/ray-project/test_dag/archive/41d09119cbdf8450599f993f51318e9e27c59098.zip"
    deployments:
      - name: MangoStand
        numReplicas: 1
        userConfig: |
          price: 3
        rayActorOptions:
          numCpus: 3
      - name: OrangeStand
        numReplicas: 1
        userConfig: |
          price: 2
        rayActorOptions:
          numCpus: 0.1
      - name: PearStand
        numReplicas: 1
        userConfig: |
          price: 1
        rayActorOptions:
          numCpus: 0.1
      - name: FruitMarket
        numReplicas: 1
        rayActorOptions:
          numCpus: 0.1
      - name: DAGDriver
        numReplicas: 1
        routePrefix: "/"
        rayActorOptions:
          numCpus: 0.1
  rayClusterConfig:
    rayVersion: '2.3.0' # should match the Ray version in the image of the containers
    ######################headGroupSpecs#################################
    # Ray head pod template.
    headGroupSpec:
      serviceType: LoadBalancer # optional
      # the following params are used to complete the ray start: ray start --head --block --redis-port=6379 ...
      rayStartParams:
        port: '6379' # should match container port named gcs-server
        dashboard-host: '0.0.0.0'
        num-cpus: '2' # can be auto-completed from the limits
        block: 'true'
      #pod template
      template:
        spec:
          containers:
            - name: ray-head
              image: rayproject/ray:2.3.0
              resources:
                limits:
                  cpu: 2
                  memory: 2Gi
                requests:
                  cpu: 2
                  memory: 2Gi
              ports:
                - containerPort: 6379
                  name: gcs-server
                - containerPort: 8265 # Ray dashboard
                  name: dashboard
                - containerPort: 10001
                  name: client
                - containerPort: 8000
                  name: serve
    workerGroupSpecs:
      # the pod replicas in this group typed worker
      - replicas: 2
        minReplicas: 2
        maxReplicas: 5
        # logical group name, for this called small-group, also can be functional
        groupName: small-group
        rayStartParams:
          block: 'true'
        #pod template
        template:
          spec:
            initContainers:
              # the env var $FQ_RAY_IP is set by the operator if missing, with the value of the head service name
              - name: init
                image: busybox:1.28
                command: ['sh', '-c', "until nslookup $RAY_IP.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for K8s Service $RAY_IP; sleep 2; done"]
            containers:
              - name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name',  or '123-abc'
                image: rayproject/ray:2.3.0
                lifecycle:
                  preStop:
                    exec:
                      command: ["/bin/sh","-c","ray stop"]
                resources:
                  limits:
                    cpu: "2"
                    memory: "2Gi"
                  requests:
                    cpu: "500m"
                    memory: "2Gi"
    headServiceAnnotations: {}
      # annotations passed on for the Head Service
      # service_key: "service_value"

Anything else

No response

Are you willing to submit a PR?

[ ] Yes I am willing to submit a PR!

Yicheng-Lu-llll commented 1 year ago

In my understanding, You need a single pod that have logic cpu number >=3. In your case, You only have pods with 2 logic cpus each.

park12sj commented 1 year ago

@Yicheng-Lu-llll

Isn't it the concept of serving by clustering resources of multiple pods?

For example, my final purpose is to serve models that require multiple gpu in the reference. I tried to cluster multiple Worker pods with gpu resources one by one and do multi-gpu serving, is this impossible?

To sum up, I would like to do multi node, multi-gpu serving for one large model.

akshay-anyscale commented 1 year ago

please take a look at Aviary - https://www.anyscale.com/blog/announcing-aviary-open-source-multi-llm-serving-solution , the github repo has some examples for how to setup multi-node/multi-gpu models with Ray Serve using placement groups

ray-project / kuberay