[Bug] Can not convert mmaction2 model to trt with dynamic shape.

aixiaodewugege commented 1 year ago

Checklist

[X] I have searched related issues but cannot get the expected help.
[X] 2. I have read the FAQ documentation but cannot get the expected help.
[X] 3. The bug has not been fixed in the latest version.

Describe the bug

I try to convert the csn model to trt format but failed when use dynamic input shape option.

Reproduction

python tools/deploy.py configs/mmaction/video-recognition/video-recognition_3d_dynamic.py ../mmaction2/configs/recognition/csn/ircsn_ig65m-pretrained-r152-bnfrozen_8xb12-32x2x1-58e_kinetics400-rgb.py ~/.cache/torch/hub/checkpoints/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb_20200812-9037a758.pth ../mmaction2/demo/demo.mp4 --work-dir ../csn --dump-info --device cuda:0

I also tried to change the opt_shape in static config but also failed.

Environment

03/28 21:53:33 - mmengine - INFO - **********Environmental information**********
03/28 21:53:33 - mmengine - INFO - sys.platform: linux
03/28 21:53:33 - mmengine - INFO - Python: 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0]
03/28 21:53:33 - mmengine - INFO - CUDA available: True
03/28 21:53:33 - mmengine - INFO - numpy_random_seed: 2147483648
03/28 21:53:33 - mmengine - INFO - GPU 0: NVIDIA GeForce RTX 3090
03/28 21:53:33 - mmengine - INFO - CUDA_HOME: /usr/local/cuda
03/28 21:53:33 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.7, V11.7.64
03/28 21:53:33 - mmengine - INFO - GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
03/28 21:53:33 - mmengine - INFO - PyTorch: 1.13.1+cu117
03/28 21:53:33 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.7
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.8.1  (built against CUDA 11.8)
    - Built with CuDNN 8.5
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

03/28 21:53:33 - mmengine - INFO - TorchVision: 0.14.1+cu117
03/28 21:53:33 - mmengine - INFO - OpenCV: 4.5.5
03/28 21:53:33 - mmengine - INFO - MMEngine: 0.6.0
03/28 21:53:33 - mmengine - INFO - MMCV: 2.0.0rc4
03/28 21:53:33 - mmengine - INFO - MMCV Compiler: GCC 9.3
03/28 21:53:33 - mmengine - INFO - MMCV CUDA Compiler: 11.7
03/28 21:53:33 - mmengine - INFO - MMDeploy: 1.0.0rc3+032ce75
03/28 21:53:33 - mmengine - INFO - 

03/28 21:53:33 - mmengine - INFO - **********Backend information**********
03/28 21:53:33 - mmengine - INFO - tensorrt:    8.4.2.4
03/28 21:53:33 - mmengine - INFO - tensorrt custom ops: Available
03/28 21:53:33 - mmengine - INFO - ONNXRuntime: None
03/28 21:53:33 - mmengine - INFO - ONNXRuntime-gpu:     1.8.1
03/28 21:53:33 - mmengine - INFO - ONNXRuntime custom ops:      Available
03/28 21:53:33 - mmengine - INFO - pplnn:       None
03/28 21:53:33 - mmengine - INFO - ncnn:        None
03/28 21:53:33 - mmengine - INFO - snpe:        None
03/28 21:53:33 - mmengine - INFO - openvino:    2022.3.0
03/28 21:53:33 - mmengine - INFO - torchscript: 1.13.1
03/28 21:53:33 - mmengine - INFO - torchscript custom ops:      NotAvailable
03/28 21:53:33 - mmengine - INFO - rknn-toolkit:        None
03/28 21:53:33 - mmengine - INFO - rknn-toolkit2:       None
03/28 21:53:33 - mmengine - INFO - ascend:      None
03/28 21:53:33 - mmengine - INFO - coreml:      None
03/28 21:53:33 - mmengine - INFO - tvm: None
03/28 21:53:33 - mmengine - INFO - vacc:        None
03/28 21:53:33 - mmengine - INFO - 

03/28 21:53:33 - mmengine - INFO - **********Codebase information**********
03/28 21:53:33 - mmengine - INFO - mmdet:       3.0.0rc6
03/28 21:53:33 - mmengine - INFO - mmseg:       None
03/28 21:53:33 - mmengine - INFO - mmcls:       None
03/28 21:53:33 - mmengine - INFO - mmocr:       None
03/28 21:53:33 - mmengine - INFO - mmedit:      None
03/28 21:53:33 - mmengine - INFO - mmdet3d:     None
03/28 21:53:33 - mmengine - INFO - mmpose:      1.0.0rc1
03/28 21:53:33 - mmengine - INFO - mmrotate:    None
03/28 21:53:33 - mmengine - INFO - mmaction:    1.0.0rc3
03/28 21:53:33 - mmengine - INFO - mmrazor:     None

Error traceback

Traceback (most recent call last):
  File "tools/deploy.py", line 335, in <module>
    main()
  File "tools/deploy.py", line 134, in main
    device=args.device)
  File "/home/wushuchen/projects/mmyolo/mmdeploy/mmdeploy/backend/sdk/export_info.py", line 347, in export2SDK
    deploy_info = get_deploy(deploy_cfg, model_cfg, work_dir, device)
  File "/home/wushuchen/projects/mmyolo/mmdeploy/mmdeploy/backend/sdk/export_info.py", line 263, in get_deploy
    deploy_cfg, model_cfg, work_dir=work_dir, device=device)
  File "/home/wushuchen/projects/mmyolo/mmdeploy/mmdeploy/backend/sdk/export_info.py", line 62, in get_model_name_customs
    model_cfg=model_cfg, deploy_cfg=deploy_cfg, device=device)
  File "/home/wushuchen/projects/mmyolo/mmdeploy/mmdeploy/apis/utils/utils.py", line 43, in build_task_processor
    check_backend_device(deploy_cfg=deploy_cfg, device=device)
  File "/home/wushuchen/projects/mmyolo/mmdeploy/mmdeploy/apis/utils/utils.py", line 21, in check_backend_device
    backend = get_backend(deploy_cfg).value
  File "/home/wushuchen/projects/mmyolo/mmdeploy/mmdeploy/utils/config_utils.py", line 133, in get_backend
    assert 'type' in backend_config, 'The backend config of deploy config'\
AssertionError: The backend config of deploy configrequires a "type" field

irexyc commented 1 year ago

A full deploy config with dynamic shape should looks like this:

onnx_config = dict(
    type='onnx',
    export_params=True,
    keep_initializers_as_inputs=False,
    opset_version=11,
    save_file='end2end.onnx',
    input_names=['input'],
    output_names=['output'],
    input_shape=[256, 256],
    optimize=True,
    dynamic_axes=dict(
        input=dict({
            0: 'batch',
            1: 'num_crops * num_segs',
            3: 'time',
            4: 'height',
            5: 'width'
        }),
        output=dict({0: 'batch'})))
codebase_config = dict(type='mmaction', task='VideoRecognition')
backend_config = dict(
    type='tensorrt',
    common_config=dict(fp16_mode=False, max_workspace_size=1073741824),
    model_inputs=[
        dict(
            input_shapes=dict(
                input=dict(
                    min_shape=[1, 1, 3, 32, 256, 256],
                    opt_shape=[1, 64, 3, 32, 256, 256],
                    max_shape=[1, 128, 3, 32, 256, 256])))
    ])

There are two places you should focus on:

dynamic_axes
input_shapes

The dynamic for mmaction2 is ususlly caused by SampleFrames and xxxCrop(like ThreeCrop, CenterCrop). There two parts affect dim1 and dim4 (dim1 = num_clips * xx crop, dim4 = clip_len). Therefore, if you want to extract different frames between videos, you have to set these two dim dynamic in onnx_config. And set appropriate min_shape/opt_shape/max_shape in backend_config of tensorrt (in this example, I only set dynamic dim1 between 1 to 128)

If your the input have save width and height, there is no need to set dim4 and dim5 to dynamic, so you can remove 4: 'height' and 5: 'width

aixiaodewugege commented 1 year ago

A full deploy config with dynamic shape should looks like this:
onnx_config = dict(
    type='onnx',
    export_params=True,
    keep_initializers_as_inputs=False,
    opset_version=11,
    save_file='end2end.onnx',
    input_names=['input'],
    output_names=['output'],
    input_shape=[256, 256],
    optimize=True,
    dynamic_axes=dict(
        input=dict({
            0: 'batch',
            1: 'num_crops * num_segs',
            3: 'time',
            4: 'height',
            5: 'width'
        }),
        output=dict({0: 'batch'})))
codebase_config = dict(type='mmaction', task='VideoRecognition')
backend_config = dict(
    type='tensorrt',
    common_config=dict(fp16_mode=False, max_workspace_size=1073741824),
    model_inputs=[
        dict(
            input_shapes=dict(
                input=dict(
                    min_shape=[1, 1, 3, 32, 256, 256],
                    opt_shape=[1, 64, 3, 32, 256, 256],
                    max_shape=[1, 128, 3, 32, 256, 256])))
    ])
There are two places you should focus on:

dynamic_axes

input_shapes

The dynamic for mmaction2 is ususlly caused by SampleFrames and xxxCrop(like ThreeCrop, CenterCrop). There two parts affect dim1 and dim4 (dim1 = num_clips * xx crop, dim4 = clip_len). Therefore, if you want to extract different frames between videos, you have to set these two dim dynamic in onnx_config. And set appropriate min_shape/opt_shape/max_shape in backend_config of tensorrt (in this example, I only set dynamic dim1 between 1 to 128)

If your the input have save width and height, there is no need to set dim4 and dim5 to dynamic, so you can remove 4: 'height' and 5: 'width

Thank for your reply!

To be classify, ThreeCrop is already inside your sdk, so the model input shape is defined by ThreeCrop output size, so if I want to set height and width to dynamic I should change the ThreeCrop size in model config file before conver? Is that right?

irexyc commented 1 year ago

You may not getting the point.

First, you should ask for yourself, what dynamic do you want? Is it the widht and height for a video frame? Or is it the number of frames you want to extract from each video?

For width and height, If you transform pipeline have a crop(no matter centercrop or threecrop), the image size after this transform is determined (the crop_size). Therefore, there is no need to set dynamic height or width.

For dynamic number of frames you extract from each videos, the dim1 = num_clips * xx crop, dim4 = clip_len

aixiaodewugege commented 1 year ago

Got it. Thanks!

irexyc commented 1 year ago

I closed the issue. If you have any other questions, feel free to open it.

open-mmlab / mmdeploy