Closed jiaodong closed 2 years ago
Thanks @jiaodong ! Assigning to @edoakes @ericl @simon-mo for reviews
@zhe-thoughts for the process, should this be a PR as well? Or the already merged #3 counts.
@simon-mo : We should use PRs to finalize a REP (so it should mainly be the proposer and the shepherd working on the PR). Then we merge the PR, and the REP becomes a shepherded design proposal.
Then we use the issue to comment on the design proposal. How does it sound? Do you think it's easier to comment on the proposal as a PR?
It's still early in the process and we should iterate
I would imagine the comment process involves proposer iterating on the content to incorporate feedback from the reviewers as well.
We've gotten feedback the separate issue thing is quite confusing. I think we should just stick to routing comments to the main PR, whether or not it's merged. I pushed a change to the README to direct readers to do that.
Thanks @ericl , the process change sounds good to me. @simon-mo also gave that feedback. I think the only downside is that for Markdown, the experience of reviewing the PR could be a bit suboptimal. But overall I agree with making the change.
Summary - Serve Pipeline
General Motivation
Production machine learning serving pipelines are getting longer and wider. They often consist of multiple, or even tens of models collectively making a final prediction, such as image / video content classification and tagging, fraud detection pipeline with multiple policies and models, multi-stage ranking and recommendation, etc.
Meanwhile, the size of a model is also growing beyond the memory limit of a single machine due to the exponentially growing number of parameters, such as GPT-3, sparse feature embeddings in recsys models such that the ability to do disaggregated and distributed inference is desirable and future proof.
We want to leverage the programmable and general purpose distributed computing ability of Ray, double down on its unique strengths (scheduling, communication and shared memory) to facilitate authoring, orchestrating, scaling and deployment of complex serving pipelines under one set of DAG API, so a user can program & test multiple models or multiple shards of a single large model dynamically, deploy to production at scale, and upgrade individually.
Key requirements:
Should this change be within
ray
or outside?main
ray
project. Changes are made to Ray Core and Ray Serve level.Stewardship
Required Reviewers
The proposal will be open to the public, but please suggest a few experience Ray contributors in this technical domain whose comments will help this proposal. Ideally, the list should include Ray committers.
@ericl, @edoakes, @simon-mo, @jiaodong
Shepherd of the Proposal (should be a senior committer)
To make the review process more productive, the owner of each proposal should identify a shepherd (should be a senior Ray committer). The shepherd is responsible for working with the owner and making sure the proposal is in good shape (with necessary information) before marking it as ready for broader review.
@ericl
Design and Architecture
Example - Diagram
We want to author a simple diamond-shaped DAG where user provided inputs is send to two models (m1, m2) where each access partial or idential input, and also forward part of original input to the final ensemble stage to compute final output.
Example - Code
Classes or functions decorated by ray can be directly used in Ray DAG building.
A DAG authored with Ray DAG API should be locally executable just by Ray Core runtime.
A Ray DAG can be built into an
serve application
that contains all nodes needed.Configure individual deployments in app, with same variable name used in ray_dag.
We reserve the name and generate a serve
ingress
deployment that takes care of HTTP / gRPC, input schema validation, adaption, etc. It's our python interface to configure pipeline ingress.A serve pipeline application can be built into a YAML file for structured deployment, and configurable by the Ops team by directly mutating configurable fields without deep knowledge or involvement of model code in the pipeline.
Compatibility, Deprecation, and Migration Plan
An important part of the proposal is to explicitly point out any compability implications of the proposed change. If there is any, we should thouroughly discuss a plan to deprecate existing APIs and migration to the new one(s).
Ray Core
.bind()
method on ray decorated function or class.Ray Serve
Deployment
and class instances with deployment'sRayServeHandle
for better compatibility, deprecation as well as migration.Breaking Changes: Ray Serve
build()
call.Ingress
component for serve pipeline.Deprecation
Migration Plan: Ray Serve
Ingress
andServe App
APIs later on.Test Plan and Acceptance Criteria
The proposal should discuss how the change will be tested before it can be merged or enabled. It should also include other acceptance criteria including documentation and examples.
(Optional) Follow-on Work