kubeflow / arena

A CLI for Kubeflow.
Apache License 2.0
730 stars 177 forks source link

Support ray job #1120

Open qile123 opened 2 weeks ago

qile123 commented 2 weeks ago

Hello, I hope to take on the work of supporting rayjob in arena. Here is my general design:

Usage example:

arena submit rayjob \
    --name=rayjob-sample \
    --image=rayproject/ray:2.34.0 \
    --head-memory=4Gi \
    --worker-replicas=1  \
    --worker-memory=4Gi \
    --entrypoint="python /home/ray/samples/sample_code.py"

Parameters that need to be specified:

  1. the name of ray job.
  2. the image of ray job.
  3. the command to be executed when submitting rayjob.
  4. the cpu resource to HeadPod for the training.(default: 1)
  5. the memory resource to HeadPod for the training.(default: 2Gi)
  6. the cpu resource to WorkerPod for the training.(default: 1)
  7. the memory resource to WorkerPod for the training.(default: 1Gi)