triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.39k stars 1.49k forks source link

Is it possible to imlement ensemble with BLS #7603

Closed ash2703 closed 2 months ago

ash2703 commented 2 months ago

I have this use case where I want to share the same input across multiple models such that it can be inferred in parallel and independently Now this is completely supported by ensemble, but I have a use case where i would like to be selective in which model receives the input, there are chances that I want only 1 of 5 models to do the inferencing.

The general flow is like this

graph TD
    A[Input: image] --> B[detector_preprocess]
    B --> |input| C[detector]
    B --> |preprocess_ratio| D[detector_postprocess]
    C --> |output| D
    D --> |boxes| E[recognizer_preprocess]
    A --> |image| E
    E --> |preprocessed_crops| F1[recognizer lang 1]
    E --> |preprocessed_crops| F2[recognizer lang 2]
    E --> |preprocessed_crops| F3[recognizer lang 3]
    E --> |preprocessed_crops| F4[recognizer lang 4]
    F1 --> |recognition_output| G[recognizer_postprocess]
    F2 --> |recognition_output| G
    F3 --> |recognition_output| G
    F4 --> |recognition_output| G
    G --> |decoded_text| H[Output: decoded_text]
    G --> |confidence_scores| I[Output: confidence_scores]

The flow above is possible via ensemble, but say for a given request I only wish to run recognizer lang 1 and recognizer lang 2 Is this possible via ensemble, if not how can i leverage BLS while still keeping this flow.

https://github.com/triton-inference-server/server/issues/7589

ash2703 commented 2 months ago

@Tabrizian Sorry for tagging directly, any help is appreciated Thanks!

gpadiolleau commented 2 months ago

I have this use case where I want to share the same input across multiple models such that it can be inferred in parallel and independently Now this is completely supported by ensemble, but I have a use case where i would like to be selective in which model receives the input, there are chances that I want only 1 of 5 models to do the inferencing.

The general flow is like this

graph TD
    A[Input: image] --> B[detector_preprocess]
    B --> |input| C[detector]
    B --> |preprocess_ratio| D[detector_postprocess]
    C --> |output| D
    D --> |boxes| E[recognizer_preprocess]
    A --> |image| E
    E --> |preprocessed_crops| F1[recognizer lang 1]
    E --> |preprocessed_crops| F2[recognizer lang 2]
    E --> |preprocessed_crops| F3[recognizer lang 3]
    E --> |preprocessed_crops| F4[recognizer lang 4]
    F1 --> |recognition_output| G[recognizer_postprocess]
    F2 --> |recognition_output| G
    F3 --> |recognition_output| G
    F4 --> |recognition_output| G
    G --> |decoded_text| H[Output: decoded_text]
    G --> |confidence_scores| I[Output: confidence_scores]

The flow above is possible via ensemble, but say for a given request I only wish to run recognizer lang 1 and recognizer lang 2 Is this possible via ensemble, if not how can i leverage BLS while still keeping this flow.

7589

I think it is possible to do that by selecting the model you want to infer in the request check this simple example on how to select the model you want to infer.

ash2703 commented 2 months ago

So the idea is to treat the recognizer_preprocess as a BLS within the ensemle, essentially the ensemble is only till the preprocessing layer and recognizers are separate models outside the ensemble?

gpadiolleau commented 2 months ago

I think you can do the whole process in a BLS, I don't know if it is possible to use ensemble models in a BLS as I never tried but I don't see why i wouldn't work.

You could have:

Assuming your recognizer_postprocess deals with zeroed inputs, just fill your unused recognizer outputs with zeros (or then with anything your postprocess will ignore) and it should work

ash2703 commented 2 months ago

Thanks! This atleast gave me the motivation to split into 2 ensembles, Will keep updated if the middle routing works

Tabrizian commented 2 months ago

As @gpadiolleau mentioned, you can call whatever model you want in BLS (including ensembles) so you should be able to create that pipeline in BLS.

ash2703 commented 2 months ago

Thanks @gpadiolleau @Tabrizian This worked! Created an ensemble till the recognizer_preprocess and mini ensembles for recognizer + recognizer_postprocess And a BLS router at the end of first ensemble does the magic!