open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.75k stars 631 forks source link

CRNN TensorRT model batch inference result is strange #1128

Closed huliang2016 closed 1 year ago

huliang2016 commented 2 years ago

Checklist

Describe the bug

After converting the CRNN model from torch to TensorRT, I would like to use batch inference to speed up. But the Inference result seems strange.

The origin image:

test

If we set batch_size to 2 or 16, the result seems reasonable.

image image

But when we set batch_size to 3 or 12, the result seems strange.

image image

Reproduction

step0. modify /path_to_mmocr/configs/_base_/recog_pipelines/crnn_pipeline.py, changing test_pipeline.max_width=800 step1. For the convert model command, looks likes:

DEPLOY_CFG_PATH="./configs/mmocr/text-recognition/text-recognition_tensorrt-fp16_dynamic-32x32-32x640.py"
WORK_DIR="crnn_fp16"

# 模型路径
MODEL_CFG_PATH="/path_to_mmocr/configs/textrecog/crnn/crnn_academic_dataset.py"
MODEL_CHECKPOINT_PATH="/path_to_mmocr/crnn_academic-a723a1c5.pth"

# 测试图片路径
TEST_IMG=test.png
INPUT_IMG=${TEST_IMG}

# 指定 device
DEVICE="cuda"

python ./tools/deploy.py \
        ${DEPLOY_CFG_PATH} \
        ${MODEL_CFG_PATH} \
        ${MODEL_CHECKPOINT_PATH} \
        ${INPUT_IMG} \
        --test-img ${TEST_IMG} \
        --work-dir ${WORK_DIR} \
        --device ${DEVICE} \
        --log-level INFO \
        --show \
        --dump-info

step2. need to modify the exported TRT_MODEL_PATH/pipeline.json to support batch inference by set "is_batched": true, as this issue suggests step3. Test command:

import cv2
from mmdeploy_python import TextDetector, TextRecognizer

recognizer = TextRecognizer(model_path="crnn_fp16/", device_name="cuda", device_id=0)

image = cv2.imread("test.png")
for item in recognizer.batch([image] * 12):
    print(item[0])

Environment

2022-09-29 09:22:43,889 - mmdeploy - INFO - **********Environmental information**********
2022-09-29 09:22:44,142 - mmdeploy - INFO - sys.platform: linux
2022-09-29 09:22:44,142 - mmdeploy - INFO - Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0]
2022-09-29 09:22:44,142 - mmdeploy - INFO - CUDA available: True
2022-09-29 09:22:44,142 - mmdeploy - INFO - GPU 0: NVIDIA A100-SXM4-80GB
2022-09-29 09:22:44,142 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda
2022-09-29 09:22:44,142 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 11.3, V11.3.109
2022-09-29 09:22:44,142 - mmdeploy - INFO - PyTorch: 1.11.0
2022-09-29 09:22:44,142 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

2022-09-29 09:22:44,142 - mmdeploy - INFO - TorchVision: 0.12.0
2022-09-29 09:22:44,142 - mmdeploy - INFO - OpenCV: 4.5.4
2022-09-29 09:22:44,142 - mmdeploy - INFO - MMCV: 1.5.0
2022-09-29 09:22:44,142 - mmdeploy - INFO - MMCV Compiler: GCC 7.3
2022-09-29 09:22:44,142 - mmdeploy - INFO - MMCV CUDA Compiler: 11.3
2022-09-29 09:22:44,142 - mmdeploy - INFO - MMDeploy: 0.8.0+14e31fb
2022-09-29 09:22:44,142 - mmdeploy - INFO - 

2022-09-29 09:22:44,142 - mmdeploy - INFO - **********Backend information**********
2022-09-29 09:22:44,886 - mmdeploy - INFO - onnxruntime: 1.12.1 ops_is_avaliable : True
2022-09-29 09:22:44,975 - mmdeploy - INFO - tensorrt: 8.4.3.1   ops_is_avaliable : True
2022-09-29 09:22:45,028 - mmdeploy - INFO - ncnn: None  ops_is_avaliable : False
2022-09-29 09:22:45,048 - mmdeploy - INFO - pplnn_is_avaliable: False
2022-09-29 09:22:45,111 - mmdeploy - INFO - openvino_is_avaliable: True
2022-09-29 09:22:45,160 - mmdeploy - INFO - snpe_is_available: False
2022-09-29 09:22:45,185 - mmdeploy - INFO - ascend_is_available: False
2022-09-29 09:22:45,210 - mmdeploy - INFO - coreml_is_available: False
2022-09-29 09:22:45,210 - mmdeploy - INFO - 

2022-09-29 09:22:45,210 - mmdeploy - INFO - **********Codebase information**********
2022-09-29 09:22:45,212 - mmdeploy - INFO - mmdet:      2.20.0
2022-09-29 09:22:45,212 - mmdeploy - INFO - mmseg:      0.28.0
2022-09-29 09:22:45,212 - mmdeploy - INFO - mmcls:      0.23.0
2022-09-29 09:22:45,212 - mmdeploy - INFO - mmocr:      0.4.1
2022-09-29 09:22:45,212 - mmdeploy - INFO - mmedit:     0.15.2
2022-09-29 09:22:45,212 - mmdeploy - INFO - mmdet3d:    None
2022-09-29 09:22:45,212 - mmdeploy - INFO - mmpose:     0.25.1
2022-09-29 09:22:45,212 - mmdeploy - INFO - mmrotate:   None

Error traceback

No response

tpoisonooo commented 2 years ago

@lvhan028

irexyc commented 2 years ago

Hi, @huliang2016 Sorry for late reply, I can't reproduce your problem.

While following your steps, I have to modify some things. First, as you change test_pipeline.max_width to 800, I have to change last dim of max_shape to 800 in deploy_cfg. Second, when convert the model, as crnn use single channel image as input according to this config, I have to use the text-recognition_tensorrt-fp16_dynamic-1x32x32-1x32x640.py. To support batch inference, I also changed the batch dim of deploy_cfg to 1/8/16.

After converting the model, I used your test command to test with batch size from 1 to 16, and the results are all nideployismsorgreator.

Have you modified the model config or if I missed someting?

huliang2016 commented 2 years ago

Thanks for your reply.

Have you modified TRT_MODEL_PATH/pipeline.json to support batch inference by set "is_batched": true ?

AS this issue suggests

irexyc commented 2 years ago

Yes, I added "is_batched": true

huliang2016 commented 2 years ago

also TextRecognizer imported from mmdeploy_python ? and do model inference by recognizer.batch([image] * 12)

irexyc commented 2 years ago

Yes, as I describe above, I followed your steps except some modification to make the convert success.

huliang2016 commented 2 years ago

that's strange... Do you have any suggestions? And how about your env?

irexyc commented 2 years ago

You log list libaries version like tensorrt: 8.4.3.1, MMDeploy: 0.8.0+14e31fb. I used the same version with yours.

Following your steps can't convert the model, so I modified config. Have you also modified the config as I said above?

huliang2016 commented 2 years ago

Yes

My ./configs/mmocr/text-recognition/text-recognition_tensorrt-fp16_dynamic-1x32x32-1x32x640.py file like follows:

_base_ = [
    './text-recognition_dynamic.py', '../../_base_/backends/tensorrt-fp16.py'
]
backend_config = dict(
    common_config=dict(max_workspace_size=1 << 32),
    model_inputs=[
        dict(
            input_shapes=dict(
                input=dict(
                    min_shape=[1, 1, 32, 800],
                    opt_shape=[32, 1, 32, 800],
                    max_shape=[256, 1, 32, 800])))
    ])
huliang2016 commented 2 years ago

@irexyc Could you please run this code in jupyter or in single python file?

at the first time, we run

for item in recognizer.batch([image] * 16):
    print(item[0])

after that, in the same kernel, we run:

for item in recognizer.batch([image] * 12):
    print(item[0])
irexyc commented 2 years ago

I tried it, and the result still to be same.