Closed park12sj closed 1 year ago
In my understanding, You need a single pod that have logic cpu number >=3. In your case, You only have pods with 2 logic cpus each.
@Yicheng-Lu-llll
Isn't it the concept of serving by clustering resources of multiple pods?
For example, my final purpose is to serve models that require multiple gpu in the reference. I tried to cluster multiple Worker pods with gpu resources one by one and do multi-gpu serving, is this impossible?
To sum up, I would like to do multi node, multi-gpu serving for one large model.
please take a look at Aviary - https://www.anyscale.com/blog/announcing-aviary-open-source-multi-llm-serving-solution , the github repo has some examples for how to setup multi-node/multi-gpu models with Ray Serve using placement groups
Search before asking
KubeRay Component
Others
What happened + What you expected to happen
Reproduction script
Anything else
No response
Are you willing to submit a PR?