open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.77k stars 636 forks source link

[Bug] run ocr tensorrt model using DynamicBatch config #2228

Closed leemayi closed 1 year ago

leemayi commented 1 year ago

Checklist

Describe the bug

There are over 20 targets on an ocr image, and now using sdk takes 500ms, and 450ms spent on Satrn model. I see that there is a DynamicBatch method in the source code, but after I convert the satrn model to the tensorrt model of 10 batch, I debug and see the isbatched(task.h) variable is still false.

So how can I reduce the time for character recognition?

Reproduction

None

Environment

06/29 19:24:02 - mmengine - INFO -

06/29 19:24:02 - mmengine - INFO - **********Environmental information**********
06/29 19:24:05 - mmengine - INFO - sys.platform: win32
06/29 19:24:05 - mmengine - INFO - Python: 3.8.16 (default, Jun 12 2023, 21:00:42) [MSC v.1916 64 bit (AMD64)]
06/29 19:24:05 - mmengine - INFO - CUDA available: True
06/29 19:24:05 - mmengine - INFO - numpy_random_seed: 2147483648
06/29 19:24:05 - mmengine - INFO - GPU 0: NVIDIA GeForce RTX 3060
06/29 19:24:05 - mmengine - INFO - CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6
06/29 19:24:05 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.6, V11.6.112
06/29 19:24:05 - mmengine - INFO - MSVC: 用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.33.31629 版
06/29 19:24:05 - mmengine - INFO - GCC: n/a
06/29 19:24:05 - mmengine - INFO - PyTorch: 1.12.1+cu116
06/29 19:24:05 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - C++ Version: 199711
  - MSVC 192829337
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 2019
  - LAPACK is enabled (usually provided by MKL)
  - CPU capability usage: AVX2
  - CUDA Runtime 11.6
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.3.2  (built against CUDA 11.5)
  - Magma 2.5.4
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/actions-runner/_work/pytorch/pytorch/builder/windows/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF,

06/29 19:24:05 - mmengine - INFO - TorchVision: 0.13.1+cu116
06/29 19:24:05 - mmengine - INFO - OpenCV: 4.7.0
06/29 19:24:05 - mmengine - INFO - MMEngine: 0.7.4
06/29 19:24:05 - mmengine - INFO - MMCV: 2.0.0
06/29 19:24:05 - mmengine - INFO - MMCV Compiler: MSVC 192829924
06/29 19:24:05 - mmengine - INFO - MMCV CUDA Compiler: 11.6
06/29 19:24:05 - mmengine - INFO - MMDeploy: 1.1.0+
06/29 19:24:05 - mmengine - INFO -

06/29 19:24:05 - mmengine - INFO - **********Backend information**********
06/29 19:24:05 - mmengine - INFO - tensorrt:    8.6.1
06/29 19:24:05 - mmengine - INFO - tensorrt custom ops: Available
06/29 19:24:05 - mmengine - INFO - ONNXRuntime: None
06/29 19:24:05 - mmengine - INFO - pplnn:       None
06/29 19:24:05 - mmengine - INFO - ncnn:        None
06/29 19:24:05 - mmengine - INFO - snpe:        None
06/29 19:24:05 - mmengine - INFO - openvino:    None
06/29 19:24:05 - mmengine - INFO - torchscript: 1.12.1+cu116
06/29 19:24:05 - mmengine - INFO - torchscript custom ops:      NotAvailable
06/29 19:24:05 - mmengine - INFO - rknn-toolkit:        None
06/29 19:24:05 - mmengine - INFO - rknn-toolkit2:       None
06/29 19:24:05 - mmengine - INFO - ascend:      None
06/29 19:24:05 - mmengine - INFO - coreml:      None
06/29 19:24:05 - mmengine - INFO - tvm: None
06/29 19:24:05 - mmengine - INFO - vacc:        None
06/29 19:24:05 - mmengine - INFO -

06/29 19:24:05 - mmengine - INFO - **********Codebase information**********
06/29 19:24:05 - mmengine - INFO - mmdet:       3.0.0
06/29 19:24:05 - mmengine - INFO - mmseg:       None
06/29 19:24:05 - mmengine - INFO - mmpretrain:  None
06/29 19:24:05 - mmengine - INFO - mmocr:       1.0.0
06/29 19:24:05 - mmengine - INFO - mmagic:      None
06/29 19:24:05 - mmengine - INFO - mmdet3d:     None
06/29 19:24:05 - mmengine - INFO - mmpose:      None
06/29 19:24:05 - mmengine - INFO - mmrotate:    None
06/29 19:24:05 - mmengine - INFO - mmaction:    None
06/29 19:24:05 - mmengine - INFO - mmrazor:     None

Error traceback

No response

AllentDan commented 1 year ago

Please provide the converting commands you used. Did you set the max batch_size of TensorRT in mmdeploy config and batch_size in deploy.json?

leemayi commented 1 year ago

the converting commands:

from mmdeploy.apis import torch2onnx from mmdeploy.apis.tensorrt import onnx2tensorrt from mmdeploy.backend.sdk.export_info import export2SDK import os

img = 'D:\mmdep\for_deploy_1.0\image\1.bmp' work_dir = 'work_dir/trt/satrn_mydata3_fp16' save_file = 'end2end.onnx' deploy_cfg = 'mmdeploy-1.1.0/configs/mmocr/text-recognition/text-recognition_tensorrt-fp16_dynamic-10x32x100.py' model_cfg = 'satrn_mydata3.py' model_checkpoint = 'satrn_mydata3.pth' device = 'cuda' torch2onnx(img, work_dir, save_file, deploy_cfg, model_cfg, model_checkpoint, device) onnx_model = os.path.join(work_dir, save_file) save_file = 'end2end.engine' model_id = 0 device = 'cuda' onnx2tensorrt(work_dir, save_file, model_id, deploy_cfg, onnx_model, device) export2SDK(deploy_cfg, model_cfg, work_dir, pth=model_checkpoint, device=device)

deploy_cfg file:

base = [ './text-recognition_dynamic.py', '../../base/backends/tensorrt-fp16.py' ] backend_config = dict( common_config=dict(max_workspace_size=1 << 30), model_inputs=[ dict( input_shapes=dict( input=dict( min_shape=[10, 3, 32, 100], opt_shape=[10, 3, 32, 100], max_shape=[10, 3, 32, 100]))) ])

the deploy.json: { "version": "1.1.0", "task": "TextRecognizer", "models": [ { "name": "satrn", "net": "end2end.engine", "weights": "", "backend": "tensorrt", "precision": "FP16", "batch_size": 1, "dynamic_shape": true } ], "customs": [ "dict_file.txt" ] }

the detail.json

{ "version": "1.1.0", "codebase": { "task": "TextRecognition", "codebase": "mmocr", "version": "1.0.0", "pth": "satrn_mydata3.pth", "config": "satrn_mydata3.py" }, "codebase_config": { "type": "mmocr", "task": "TextRecognition" }, "onnx_config": { "type": "onnx", "export_params": true, "keep_initializers_as_inputs": false, "opset_version": 11, "save_file": "end2end.onnx", "input_names": [ "input" ], "output_names": [ "output" ], "input_shape": null, "optimize": true, "dynamic_axes": { "input": { "0": "batch", "3": "width" }, "output": { "0": "batch", "1": "seq_len", "2": "num_classes" } } }, "backend_config": { "type": "tensorrt", "common_config": { "fp16_mode": true, "max_workspace_size": 1073741824 }, "model_inputs": [ { "input_shapes": { "input": { "min_shape": [ 10, 3, 32, 100 ], "opt_shape": [ 10, 3, 32, 100 ], "max_shape": [ 10, 3, 32, 100 ] } } } ] }, "calib_config": {} }

QA: Can I modify the value of the batch_size field in deploy.json?

AllentDan commented 1 year ago

Yes for sure. These configuration files for SDK are designed to be modified.

leemayi commented 1 year ago

Now, I'll take the batch_ Size changed to 10, sdk reported an error:

[2023-06-30 12:47:23.857] [mmdeploy] [debug] [trt_net.cpp:175] input shape: (10, 3, 32, 100) [2023-06-30 12:47:23.857] [mmdeploy] [debug] [trt_net.cpp:185] output shape: (10, 25, 93) [2023-06-30 12:47:23.889] [mmdeploy] [debug] [trt_net.cpp:175] input shape: (8, 3, 32, 100) [2023-06-30 12:47:23.890] [mmdeploy] [error] [trt_net.cpp:28] TRTNet: 3: [executionContext.cpp::nvinfer1::rt::ExecutionContext::validateInputBindings::2083] Error Code 3: API Usage Error (Parameter check failed at: executionContext.cpp::nvinfer1::rt::ExecutionContext::validateInputBindings::2083, condition: profileMinDims.d[i] <= dimensions.d[i]. Supplied binding dimension [8,3,32,100] for bindings[0] exceed min ~ max range at index 0, maximum dimension in profile is 10, minimum dimension in profile is 10, but supplied dimension is 8. )

the deploy.json

{ "version": "1.1.0", "task": "TextRecognizer", "models": [ { "name": "satrn", "net": "end2end.engine", "weights": "", "backend": "tensorrt", "precision": "FP16", "batch_size": 10, "dynamic_shape": true } ], "customs": [ "dict_file.txt" ] }

the export conf :

base = [ './text-recognition_dynamic.py', '../../base/backends/tensorrt-fp16.py' ] backend_config = dict( common_config=dict(max_workspace_size=1 << 30), model_inputs=[ dict( input_shapes=dict( input=dict( min_shape=[10, 3, 32, 100], opt_shape=[10, 3, 32, 100], max_shape=[10, 3, 32, 100]))) ])

AllentDan commented 1 year ago

Just like the error log shows but supplied dimension is 8. You did not provide a Tensor with batch_size=10 but a Tensor with batch_size=8.

leemayi commented 1 year ago

I understand this. If I have two images, one with 18 goals, and the other with 19 goals, how can I modify the batch? Can we predict all the goals at once without caring about how many are there

AllentDan commented 1 year ago

You should set a minimum batch_size for TensorRT in this case.

leemayi commented 1 year ago

like this: base = [ './text-recognition_dynamic.py', '../../base/backends/tensorrt-fp16.py' ] backend_config = dict( common_config=dict(max_workspace_size=1 << 30), model_inputs=[ dict( input_shapes=dict( input=dict( min_shape=[1, 3, 32, 100], opt_shape=[5, 3, 32, 100], max_shape=[10, 3, 32, 100]))) ])

This config supported batch range is [1, 10] ?

and then change the delpoy.json's batch_size to 10?

Am I understanding correctly?

leemayi commented 1 year ago

Sorry, I have a new question here. When I export the Tensorrt model, generate ONNX in 10 minutes, and then 1 hour later, I haven't generated a TRT model and it is stuck

(when the batch is the same, it usually takes me 20 minutes to export)

the log: [06/30/2023-15:03:40] [TRT] [W] Running layernorm after self-attention in FP16 may cause overflow. Exporting the model to the latest available ONNX opset (later than opset 17) to use the INormalizationLayer, or forcing layernorm layers to run in FP32 precision can help with preserving accuracy. [06/30/2023-15:03:42] [TRT] [I] Graph optimization time: 2.17072 seconds. [06/30/2023-15:03:42] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored. (stuck here)

the backend config:

base = [ './text-recognition_dynamic.py', '../../base/backends/tensorrt-fp16.py' ] backend_config = dict( common_config=dict(max_workspace_size=1 << 30), model_inputs=[ dict( input_shapes=dict( input=dict( min_shape=[1, 3, 32, 100], opt_shape=[5, 3, 32, 100], max_shape=[10, 3, 32, 100]))) ])

AllentDan commented 1 year ago

It is normal that some complex models take TensorRT much time to do the conversion. Dynamic batching is more complex than static batching.

github-actions[bot] commented 1 year ago

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.