Closed bdeng3 closed 2 years ago
You can deploy the three Models into three services and connect the three Models using the "Ensemble Models" functionality provided by Triton.
docs: https://github.com/triton-inference-server/server/blob/main/docs/architecture.md#ensemble-models
I see. If we deploy three Models into three services, if I understand correctly, we'll need to do some heavy lifting to manually include the preprocessing and postprocessing steps into our deployment scripts. And we can find those processing code inside the predict_system.py
offered in PaddleOCR?
Yes. In Triton, there are actually three models and one service. This service connects the three Models into a single service using "Ensemble Models" .When you use the client to send requests, you only need to send requests to the Ensemble Models, not the other three Models.
We have two examples of NLP, although it is in Chinese, but you can refer to them:
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-3.0/deploy/triton/models
'models' directory inlcude : 'ernie_seqcls' and 'ernie_tokencls' , two "Ensemble Models".
As you can see in the configuration file of 'ernie_seqcls' , it concatenates three models, 'ernie_tokenizer', 'ernie_seqcls_model', and 'ernie_seqcls_postprocess', each corresponding to three directories with the same name.
This is the 'ernie_seqcls' client request code: You just send the request to the 'ernie_seqcls' service.
@bdeng3 The first time you're dealing with a Pipeline deployment, it's a real hassle. But after success, the "Ensemble Models" worked really well.
You need to modify the config file in conjunction with the Triton official documentation and the PaddleOCR inference script.
@heliqi I see, that example really helps!
One more question, I notice the triton inference container nvcr.io/nvidia/tritonserver:21.10-py3
currently doesn't support Paddle
backend. So I guess if we closely followed the example, we'll need to convert our paddle model into onnx model and install Paddle
environment inside the docker container?
Is it possible that we use paddlepaddle/triton_paddle:21.10
container image, and serve "ensemble paddle" model instead? I guess the advantage of this way is we could skip the installation of paddlepaddle inside the container, and no longer need to convert paddle model into onnx?
@bdeng3 You used the container paddlepaddle/triton_paddle:21.10
and changed Backend
in example config to backend: "paddle"
.
Everything else in this example is the same, so you don't need to change it.
OK. Thanks again for the instructions!
To confirm, I need to change the config for model to backend: "paddle"
, and leave the configs for preprocessing or postprocessing to backend: "python"
unchanged?
@bdeng3 For the examples I provide, the preprocessing
and postprocessing
is unchanged and backend: "python"
is used.
For your PaddleOCR example, If preprocessing
and postprocessing
is also implemented in Python, backend: "python"
is used. If preprocessing
and postprocessing
is implemented in PaddlePaddle Model, backend: "paddle"
is used.
Got it. I'm looking very closely in the PaddleOCR inference code, there are a few preprocessing operation such as DetResizeForTest
, NormalizeImage
, ToCHWImage
, KeepKeys
, I think backend: python
should be the way to go.
Hey team,
I got a question on how to deploy PaddleOCR using triton.
PaddleOCR consists of three models (
detection
,orientation classifier
, andrecognition
) and some processing steps might exist between those models. I was wondering if we should wrap all three models into one "meta-model", and server the "meta-model" in Triton server? I was thinking about doing something like below:Or should we serve three models separately? But given the complexity of the
predict_system.py
script provided by the PaddleOCR team, I feel we can reuse their inference script to connect those three models together. Using these three models separately sounds like a rabbit hole. Any suggestions are very welcome!!