triton-inference-server / paddlepaddle_backend

BSD 3-Clause "New" or "Revised" License
32 stars 6 forks source link

What's the best way to deploy PaddleOCR-v3 using triton server? #10

Closed bdeng3 closed 2 years ago

bdeng3 commented 2 years ago

Hey team,

I got a question on how to deploy PaddleOCR using triton.

PaddleOCR consists of three models (detection, orientation classifier, and recognition) and some processing steps might exist between those models. I was wondering if we should wrap all three models into one "meta-model", and server the "meta-model" in Triton server? I was thinking about doing something like below:

from paddleocr import PaddleOCR
from paddle.static import InputSpec
from paddle.jit import to_static
import paddle.nn as nn

class PaddleTritonModel(nn.Layer):
    def __init__(self):
        super(PaddleTritonModel, self).__init__()
        ocr = PaddleOCR(use_angle_cls=True, lang="en")
        self.ocr = ocr

    @to_static(input_spec=[InputSpec(shape=[None, None, 3], name="x")])
    def forward(self, img):
        result = self.ocr.ocr(img, cls=True)
        return result

Or should we serve three models separately? But given the complexity of the predict_system.py script provided by the PaddleOCR team, I feel we can reuse their inference script to connect those three models together. Using these three models separately sounds like a rabbit hole. Any suggestions are very welcome!!

heliqi commented 2 years ago

You can deploy the three Models into three services and connect the three Models using the "Ensemble Models" functionality provided by Triton.

docs: https://github.com/triton-inference-server/server/blob/main/docs/architecture.md#ensemble-models

bdeng3 commented 2 years ago

I see. If we deploy three Models into three services, if I understand correctly, we'll need to do some heavy lifting to manually include the preprocessing and postprocessing steps into our deployment scripts. And we can find those processing code inside the predict_system.py offered in PaddleOCR?

heliqi commented 2 years ago

Yes. In Triton, there are actually three models and one service. This service connects the three Models into a single service using "Ensemble Models" .When you use the client to send requests, you only need to send requests to the Ensemble Models, not the other three Models.

We have two examples of NLP, although it is in Chinese, but you can refer to them:

https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-3.0/deploy/triton/models

 'models' directory inlcude : 'ernie_seqcls' and 'ernie_tokencls' ,  two  "Ensemble Models".

As you can see in the configuration file of 'ernie_seqcls' , it concatenates three models, 'ernie_tokenizer', 'ernie_seqcls_model', and 'ernie_seqcls_postprocess', each corresponding to three directories with the same name.

This is the 'ernie_seqcls' client request code: You just send the request to the 'ernie_seqcls' service.

heliqi commented 2 years ago

@bdeng3 The first time you're dealing with a Pipeline deployment, it's a real hassle. But after success, the "Ensemble Models" worked really well.

You need to modify the config file in conjunction with the Triton official documentation and the PaddleOCR inference script.

bdeng3 commented 2 years ago

@heliqi I see, that example really helps!

One more question, I notice the triton inference container nvcr.io/nvidia/tritonserver:21.10-py3 currently doesn't support Paddle backend. So I guess if we closely followed the example, we'll need to convert our paddle model into onnx model and install Paddle environment inside the docker container?

Is it possible that we use paddlepaddle/triton_paddle:21.10 container image, and serve "ensemble paddle" model instead? I guess the advantage of this way is we could skip the installation of paddlepaddle inside the container, and no longer need to convert paddle model into onnx?

heliqi commented 2 years ago

@bdeng3 You used the container paddlepaddle/triton_paddle:21.10 and changed Backend in example config to backend: "paddle". Everything else in this example is the same, so you don't need to change it.

bdeng3 commented 2 years ago

OK. Thanks again for the instructions!

To confirm, I need to change the config for model to backend: "paddle", and leave the configs for preprocessing or postprocessing to backend: "python" unchanged?

heliqi commented 2 years ago

@bdeng3 For the examples I provide, the preprocessing and postprocessing is unchanged and backend: "python" is used.

For your PaddleOCR example, If preprocessing and postprocessing is also implemented in Python, backend: "python" is used. If preprocessing and postprocessing is implemented in PaddlePaddle Model, backend: "paddle" is used.

bdeng3 commented 2 years ago

Got it. I'm looking very closely in the PaddleOCR inference code, there are a few preprocessing operation such as DetResizeForTest, NormalizeImage, ToCHWImage, KeepKeys, I think backend: python should be the way to go.