ray-project / enhancements

Tracking Ray Enhancement Proposals
Apache License 2.0
49 stars 28 forks source link

REP-001: Serve Pipeline API #2

Closed jiaodong closed 2 years ago

jiaodong commented 2 years ago

Summary - Serve Pipeline

General Motivation

Production machine learning serving pipelines are getting longer and wider. They often consist of multiple, or even tens of models collectively making a final prediction, such as image / video content classification and tagging, fraud detection pipeline with multiple policies and models, multi-stage ranking and recommendation, etc.

Meanwhile, the size of a model is also growing beyond the memory limit of a single machine due to the exponentially growing number of parameters, such as GPT-3, sparse feature embeddings in recsys models such that the ability to do disaggregated and distributed inference is desirable and future proof.

We want to leverage the programmable and general purpose distributed computing ability of Ray, double down on its unique strengths (scheduling, communication and shared memory) to facilitate authoring, orchestrating, scaling and deployment of complex serving pipelines under one set of DAG API, so a user can program & test multiple models or multiple shards of a single large model dynamically, deploy to production at scale, and upgrade individually.

Key requirements:

Should this change be within ray or outside?

main ray project. Changes are made to Ray Core and Ray Serve level.

Stewardship

Required Reviewers

The proposal will be open to the public, but please suggest a few experience Ray contributors in this technical domain whose comments will help this proposal. Ideally, the list should include Ray committers.

@ericl, @edoakes, @simon-mo, @jiaodong

Shepherd of the Proposal (should be a senior committer)

To make the review process more productive, the owner of each proposal should identify a shepherd (should be a senior Ray committer). The shepherd is responsible for working with the owner and making sure the proposal is in good shape (with necessary information) before marking it as ready for broader review.

@ericl

Design and Architecture

Example - Diagram

We want to author a simple diamond-shaped DAG where user provided inputs is send to two models (m1, m2) where each access partial or idential input, and also forward part of original input to the final ensemble stage to compute final output.

               m1.forward(dag_input[0])
            /                          \
    dag_input ----- dag_input[2] ------ ensemble -> dag_output
            \                          /  
               m2.forward(dag_input[1])  

Example - Code

Classes or functions decorated by ray can be directly used in Ray DAG building.

@ray.remote
class Model:
def __init__(self, val):
    self.val = val
def forward(self, input):
    return self.val * input

@ray.remote
def ensemble(a, b, c):
    return a + b + c

async def request_to_data_int(request: starlette.requests.Request):
    data = await request.body()
    return int(data)

# Args binding, DAG building and input preprocessor definition
with ServeInputNode(preprocessor=request_to_data_int) as dag_input:
    m1 = Model.bind(1)
    m2 = Model.bind(2)
    m1_output = m1.forward.bind(dag_input[0])
    m2_output = m2.forward.bind(dag_input[1])
    ray_dag = ensemble.bind(m1_output, m2_output, dag_input[2])

A DAG authored with Ray DAG API should be locally executable just by Ray Core runtime.

# 1*1 + 2*2 + 3
assert ray.get(ray_dag.execute(1, 2, 3)) == 8

A Ray DAG can be built into an serve application that contains all nodes needed.

# Build, configure and deploy
app = serve.pipeline.build(ray_dag)

Configure individual deployments in app, with same variable name used in ray_dag.

app.m1.set_options(num_replicas=3)
app.m2.set_options(num_replicas=5)

We reserve the name and generate a serve ingress deployment that takes care of HTTP / gRPC, input schema validation, adaption, etc. It's our python interface to configure pipeline ingress.

app.ingress.set_options(num_replicas=10)

# Translate to group_deploy behind the scene
app_handle = app.deploy()

# Serve App is locally executable
assert ray.get(app_handle.remote(1, 2, 3)) == 8

A serve pipeline application can be built into a YAML file for structured deployment, and configurable by the Ops team by directly mutating configurable fields without deep knowledge or involvement of model code in the pipeline.

deployment.yaml = app.to_yaml()

# Structured deployment CLI
serve deploy deployment.yaml

Compatibility, Deprecation, and Migration Plan

An important part of the proposal is to explicitly point out any compability implications of the proposed change. If there is any, we should thouroughly discuss a plan to deprecate existing APIs and migration to the new one(s).

Test Plan and Acceptance Criteria

The proposal should discuss how the change will be tested before it can be merged or enabled. It should also include other acceptance criteria including documentation and examples.

(Optional) Follow-on Work

zhe-thoughts commented 2 years ago

Thanks @jiaodong ! Assigning to @edoakes @ericl @simon-mo for reviews

simon-mo commented 2 years ago

@zhe-thoughts for the process, should this be a PR as well? Or the already merged #3 counts.

zhe-thoughts commented 2 years ago

@simon-mo : We should use PRs to finalize a REP (so it should mainly be the proposer and the shepherd working on the PR). Then we merge the PR, and the REP becomes a shepherded design proposal.

Then we use the issue to comment on the design proposal. How does it sound? Do you think it's easier to comment on the proposal as a PR?

It's still early in the process and we should iterate

simon-mo commented 2 years ago

I would imagine the comment process involves proposer iterating on the content to incorporate feedback from the reviewers as well.

ericl commented 2 years ago

We've gotten feedback the separate issue thing is quite confusing. I think we should just stick to routing comments to the main PR, whether or not it's merged. I pushed a change to the README to direct readers to do that.

zhe-thoughts commented 2 years ago

Thanks @ericl , the process change sounds good to me. @simon-mo also gave that feedback. I think the only downside is that for Markdown, the experience of reviewing the PR could be a bit suboptimal. But overall I agree with making the change.