Closed prashantsail closed 3 years ago
- Ensemble model represents a pipeline of models and the flow of data between them.
- This can reduce the overhead of transferring intermediate tensors and minimize the number of requests that must be sent to torchserve.
model_name
: Name of the modelbatch_size
: Inference batch sizemax_batch_delay
: Maximum delay for batch aggregationresponse_timeout
: inference response within this timeout periodhandler
: Ignored as all orchestration will be in frontend layerurl
: Ignored as mar not neededinitial_workers
: Ignored as this model will not have any workerssynchronous
: Ignored as this model will not have any workersFrontend will orchestrate request\response from models in the pipeline.
Example: Inferencing with an ensemble model
M1, M2, M3 are pre-registered models.
- A simple JSON config representing this pipeline:
{“ensemble”: [“M1”, “M2”, “M3”]}
- Frontend is responsible for :
- Collecting the _input_from the user's request (for ensemble model) and initiating a request to M1.
- Collecting output from M1 and initiating a request to M2.
- Collecting output from M2 and initiating a request to M3.
- Collecting output from M3 is and sending it back as a response to the end user.
Topic | Suggestion |
---|---|
Should ensemble model control the working\updating of the models in the pipeline? Example -
|
|
Should we support responding with outputs from intermediate models? | Yes, this should be an optional feature though. |
Should we support nesting of ensemble models? | Yes, this could be a secondary feature based on the effort required. |
Should we support pipeline structures similar to those shown below –
|
We could take this up in Phase II, as this will overtly increase complexity and effort required for this feature. |
@chauhang @dhanainme @maaquib - This is the approach we are thinking of. Let us know your thoughts.
Thanks @prashantsail for putting this together.
@chauhang Please see the comments below
- Can you also describe how the support for different pipelining options will be handled in future releases? Are you thinking of having something like the workflows for the AWS Lambda Step function?
The user will define a pipeline/workflow while registering the ensemble model. We are thinking of following two approaches for how an end-user interfaces with TorchServe for registering an ensemble model. The ensemble model will be registered as a logical model.
Approach A :
Use the already existing model registration API with a new parameter named ensemble which takes a JSON data defining the pipeline. The registration model will ignore all other existing params except the model name in case the ensemble parameter is supplied.
e.g. POST /models?model_name=xyx&ensemble={[[“M1:v1”, “M2:v2”, “M3:v3”]]}
Pros :
Approach B :
Add new set of APIs to define a workflow which takes model name and flow as input.
E.g. POST /models/workflow?workflow_name=
Pros :
Cons :
- How will the batching get handled for the Ensemble models?
There can be again two approaches here :
Approach A: Supply batch_size while registering the ensemble model and update the batch size of every model in the pipeline before running the inference just to ensure every model uses the same batch size.
Approach B: Ensemble model doesn't support batching, instead depends on the configured batch size per model in the pipeline.
- What provision will be there for debugging the ensemble pipeline?
In case of failure, the inference response will provide the model name and related error from the pipeline where the inference failed. We will also see if we can return the intermediate output of the last model which completed successful inference. We will also enhance the logs to represent if it is for an API call on the normal model or ensemble model.
- Are we going to add new metrics for the ensemble case?
The pipeline will be treated as a logical model and the existing metric mechanism will be reused, where it will return metric data for the ensemble model as a whole and for individual models in the pipeline as well. Note: The flow will be broken into sequential inference jobs by TorchServe
@chauhang Please ignore the previous comment. We had an internal discussion today, there may be a different way to approach this. We will update this ticket in a day or two.
@chauhang @lokeshgupta1975 @dhanainme @maaquib Based on the internal discussion, here are the updated approaches for ensemble support in TorchServe
Scope
Ensembles of models will support parallel inferencing with multiple models followed by an ensemble function(expressed in post-processing code) to return the inferred output.
Out of scope
Design considerations
Proposed Approach(s)
We are proposing the following design approaches to support ensemble in TorchServe.
Approach-1 (Recommended approach)
In this approach, the ensemble model orchestration will be handled via the system provided
ensemble (default) handler itself. This is our recommended approach.
Pros
Cons
High-level design
Approach-2
In this approach, TorchServe frontend layer will act as an orchestrator for ensemble model lifecycle management and inference. This approach will use the existing handler framework for loading the models present in the ensemble.
Pros
Cons
High-level design
@lxning
- Should the ensemble model require each internal model's batch size must be 1 if the client-side batches multiple requests together and sends the request to the ensemble model?
The batching for any model is done at TorchServe's frontend layer. The ensemble model's every internal model will use the same batch size that is specified for the ensemble model at the time of registration.
2. Can the ensemble model request to be mixed with the regular model's request in each model's request batch?
The model's workers for the ensemble model will be independent of the regular workers and will only serve the ensemble model requests.
Any model used in the ensemble can have its own independent workers for serving regular inference request.
As discussed with @chauhang @lokeshgupta1975 @dhanainme @maaquib @harshbafna, here is the final plan for adding workflow support to torchserve.
Goal - Able to support the ensemble of models.
Assumptions/notes
Based on the above design constraints/notes, this will be introducing a new component for creating workflow archives apart from internal modules such as WFManager and WFExecutor.
Workflow Archiver [WAR] - This will be an independent CLI utility similar to the torchserve model archiver [MAR] utility. The main purpose of this component is to create WAR file using supplied workflow specification [yaml] file,
models:
#global model params
min-workers: 1
max-workers: 4
batch-size: 8
m1:
url : model1.mar #local or S3 path
min-workers: 1 #override the global params
max-workers: 2
batch-size: 4
m2:
url : model2.mar
m3:
url : model3.mar
batch-size: 2
m4:
url : model4.mar
dag: #can have only one start node and one end node
pre_processing : [m1, m3]
m1 : [m2, m4]
m2 : [post_processing]
m4 : [post_processing]
m3 : [post_processing]
"""
Add all your entry level function which needs to be executed via workflow DAG.
"""
def pre_processing(data, context): pass
def post_processing(data, context): pass
def ensemble_reducer(data, context): pass
- _requirements.txt_ - This is to support custom Python packages required by workflow handler [if any]. This is an optional file.
**High-level flow for workflow APIs**
![high level design - workflow-apis](https://user-images.githubusercontent.com/26479924/97465555-2dc35000-1968-11eb-8c36-76b7b6db8ece.png)
**High-level components view**
![high level design - workflow-High level design](https://user-images.githubusercontent.com/26479924/97467934-b93de080-196a-11eb-895b-d28240791387.png)
Different ensemble scenarios to be covered through workflows described above: https://docs.google.com/spreadsheets/d/1x_Rj5xczANznVRJBaMrU0Wkhy-z-uWasrS4bzkQ7s3c/edit?ts=5f882aa5#gid=0
Is your feature request related to a problem? Please describe.
Describe the solution