ray-project / kuberay

A toolkit to run Ray applications on Kubernetes
Apache License 2.0
1.2k stars 389 forks source link

[Bug] apiserver failed to create ray-serve app #1188

Closed Yifan122 closed 11 months ago

Yifan122 commented 1 year ago

Search before asking

KubeRay Component

apiserver

What happened + What you expected to happen

I am trying to use apiserver to manager the ray cluster. I followed the tutorial in https://ray-project.github.io/kuberay/components/apiserver/, but failed to create ray-serve app, the ray clusters are keeping creating and deleting.

The env we are using is: kuberay: 0.5.0 ray: 2.5.0

Reproduction script

create apiserver and operator:

helm install kuberay-apiserver kuberay/kuberay-apiserver --version 0.5.0 -n ray
helm install kuberay-operator kuberay/kuberay-operator --version 0.5.0 -n ray

use api to create template:

curl -X POST 'localhost:8888/apis/v1alpha2/namespaces/ray/compute_templates' \
--header 'Content-Type: application/json' \
--data '{
  "name": "default-template",
  "namespace": "ray",
  "cpu": 1,
  "memory": 1
}'

use api to create ray-serve app:

curl -X POST 'localhost:8888/apis/v1alpha2/namespaces/ray/services' \
--header 'Content-Type: application/json' \
--data '{
  "name": "test3",
  "namespace": "ray",
  "user": "user",
  "serveDeploymentGraphSpec": {
      "importPath": "fruit.deployment_graph",
      "runtimeEnv": "working_dir: \"https://github.com/ray-project/test_dag/archive/c620251044717ace0a4c19d766d43c5099af8a77.zip\"\n",
      "serveConfigs": [
      {
        "deploymentName": "OrangeStand",
        "replicas": 1,
        "userConfig": "price: 2",
        "actorOptions": {
          "cpusPerActor": 0.1
        }
      },
      {
        "deploymentName": "PearStand",
        "replicas": 1,
        "userConfig": "price: 1",
        "actorOptions": {
          "cpusPerActor": 0.1
        }
      },
      {
        "deploymentName": "FruitMarket",
        "replicas": 1,
        "actorOptions": {
          "cpusPerActor": 0.1
        }
      },{
        "deploymentName": "DAGDriver",
        "replicas": 1,
        "routePrefix": "/",
        "actorOptions": {
          "cpusPerActor": 0.1
        }
      }]
  },
  "clusterSpec": {
    "headGroupSpec": {
      "computeTemplate": "default-template",
      "image": "rayproject/ray:2.5.0",
      "serviceType": "NodePort",
      "rayStartParams": {
            "dashboard-host": "0.0.0.0",
            "metrics-export-port": "8080"
        },
       "volumes": []
    },
    "workerGroupSpec": [
      {
        "groupName": "small-wg",
        "computeTemplate": "default-template",
        "image": "rayproject/ray:2.5.0",
        "replicas": 1,
        "minReplicas": 1,
        "maxReplicas": 1,
        "rayStartParams": {
                "node-ip-address": "$MY_POD_IP"
            }
      }
    ]
  }
}'

the result: image

Anything else

No response

Are you willing to submit a PR?

kevin85421 commented 11 months ago

@blublinsky has identified that this KubeRay API server issue can be closed.