kubeflow / arena

A CLI for Kubeflow.
Apache License 2.0
739 stars 178 forks source link

Feat: add support for distributed serving type #1187

Closed linnlh closed 1 week ago

linnlh commented 1 week ago

Purpose of this PR

This PR introduces a new serving type called distributed to Arena's serving module. The primary motivation behind these changes is to enable the deployment of large-scale models across multiple nodes within a Kubernetes (K8s) cluster.

Proposed changes:

Which issue(s) this PR fixes: Fixes #1186

Change Category

Rationale

The distributed serving type addressed the increasing demand for multi-host inference due to the advancement of large language models (LLMs) such as Meta's Llama-3.1-405B. Currently, Arena lacks the capability to deploy models distributed across multiple nodes, and this PR aims to fill the gap.

linnlh commented 1 week ago

@ChenYi015 @cheyang @Syulin7

cheyang commented 1 week ago

@linnlh Please run the following commands to download the go module into the vendor package.

go mod tidy
go mod vendor
google-oss-prow[bot] commented 1 week ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheyang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/arena/blob/master/OWNERS)~~ [cheyang] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment