[Bug] swin transformer C++ SDK inference's question

Checklist

[X] I have searched related issues but cannot get the expected help.
[X] 2. I have read the FAQ documentation but cannot get the expected help.
[X] 3. The bug has not been fixed in the latest version.

Describe the bug

I used the mmdeploy C++SDK to deploy the swin transformer model, but the demo program got stuck during runtime and could not run properly. The output information is as follows:

loading mmdeploy_trt_net.dll ...
loading mmdeploy_ort_net.dll ...
The given version [15] is not supported, only version 1 to 8 is supported in this build.
mmdeploy_model/swin_gpu
[2023-10-30 12:39:16.065] [mmdeploy] [info] [model.cpp:35] [DirectoryModel] Load model: "mmdeploy_model/swin_gpu"
[2023-10-30 12:39:18.454] [mmdeploy] [warning] [trt_net.cpp:24] TRTNet: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[2023-10-30 12:39:36.659] [mmdeploy] [warning] [trt_net.cpp:24] TRTNet: CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading

Reproduction

here is my model conversion script.
The model conversion was successful because the inference images of Pytorch and Tensorrt can be seen in the model folder.

python tools/deploy.py \
  configs/mmdet/instance-seg/instance-seg_tensorrt_dynamic-320x320-1344x1344.py \
  /my_proj/sperm/mmdetection_3.2.0/configs/swin/mask-rcnn_swin-t-p4-w7_fpn_1x_coco.py \
  /my_proj/sperm/mmdetection_3.2.0/checkpoints/mask_rcnn_swin-t-p4-w7_fpn_1x_coco_20210902_120937-9d6b7cfa.pth \
  /my_proj/sperm/mmdetection_3.2.0/demo/demo.jpg \
  --work-dir mmdeploy_model/swin_gpu \
  --device cuda \
  --dump-info

C++ SDK's code is as below.

#include <iostream>
#include <cstdlib>
#include <opencv2/opencv.hpp>
#include "mmdeploy/detector.hpp"

#include "utils/visualize.h"

int main() {
    std::cout << "hello ...." << std::endl;
    const char* device_name = "cuda";

    // mmdeploy SDK model，以上文中转出的 faster r-cnn 模型为例    
    std::string model_path = "mmdeploy_model/swin_gpu";

    std::string image_path = "img/demo.jpg";
    std::cout << model_path << std::endl;

    // 1. 读取模型
    mmdeploy::Model model(model_path);

    // 2. 创建预测器
    mmdeploy::Detector detector(model, mmdeploy::Device{ device_name });

    for (int i = 0; i < 5; i++) {
        auto start = std::chrono::high_resolution_clock::now();  //开始时间

        // 3. 读取图像
        cv::Mat img = cv::imread(image_path);
        // 4. 应用预测器推理
        mmdeploy::Detector::Result dets = detector.Apply(img);
        // 5. 处理推理结果: 此处我们选择可视化推理结果
        // visualize
        utils::Visualize v;
        v.set_palette(utils::Palette::get("coco"));
        auto sess = v.get_session(img);
        int count = 0;

        float FLAGS_det_thr = 0.3;
        for (const mmdeploy_detection_t& det : dets) {
            if (det.score > FLAGS_det_thr) {  // filter bboxes
                sess.add_det(det.bbox, det.label_id, det.score, det.mask, count++);
            }
        }

        auto stop = std::chrono::high_resolution_clock::now();   //结束时间
        auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
        std::cout << "xx时间：" << duration.count() / 1000 << "ms" << std::endl;

        // cv::imwrite("output_detection_faster-rcnn.png", sess.get());
        // cv::imwrite("dev1.x_mmdeploy_sdk_1.2.0_rtm_ins_m.png", sess.get());
        cv::imwrite("a5.png", sess.get());
    }

    system("pause");
    return 0;
}

Environment

mmdeploy: 1.3.0  
mmdeploy SDK: https://github.com/open-mmlab/mmdeploy/releases/download/v1.3.0/mmdeploy-1.3.0-windows-amd64-cuda11.8.zip  
cuda: 11.8  
cudnn: v8.9.4  
TensorRT: TensorRT-8.6.1.6.Windows10.x86_64.cuda-11.8  

OS: windows 10  
Graphics card： nvidia T 1000 8GB.

Error traceback

No response

It is normal for mmdeploy to use the C++SDK to infer the Swin transformer model on the CPU. Inferring on the GPU, the demo program has been stuck.

Is there any additional information that I need to provide?

The following content is the environment information for mmdeploy conversion.

(openmmlab_3_p38) root@99fa767e7990:/my_proj/sperm/mmdeploy_1.3.0# python tools/check_env.py 
10/31 17:15:08 - mmengine - INFO - 

10/31 17:15:08 - mmengine - INFO - **********Environmental information**********
10/31 17:15:09 - mmengine - INFO - sys.platform: linux
10/31 17:15:09 - mmengine - INFO - Python: 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0]
10/31 17:15:09 - mmengine - INFO - CUDA available: True
10/31 17:15:09 - mmengine - INFO - numpy_random_seed: 2147483648
10/31 17:15:09 - mmengine - INFO - GPU 0,1,2,3,4,5,6: NVIDIA TITAN RTX
10/31 17:15:09 - mmengine - INFO - CUDA_HOME: /usr/local/cuda
10/31 17:15:09 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.6, V11.6.55
10/31 17:15:09 - mmengine - INFO - GCC: x86_64-linux-gnu-gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
10/31 17:15:09 - mmengine - INFO - PyTorch: 1.13.1+cu116
10/31 17:15:09 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.6
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.3.2  (built against CUDA 11.5)
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

10/31 17:15:09 - mmengine - INFO - TorchVision: 0.14.1+cu116
10/31 17:15:09 - mmengine - INFO - OpenCV: 4.8.0
10/31 17:15:09 - mmengine - INFO - MMEngine: 0.8.4
10/31 17:15:09 - mmengine - INFO - MMCV: 2.0.1
10/31 17:15:09 - mmengine - INFO - MMCV Compiler: GCC 9.3
10/31 17:15:09 - mmengine - INFO - MMCV CUDA Compiler: 11.6
10/31 17:15:09 - mmengine - INFO - MMDeploy: 1.3.0+
10/31 17:15:09 - mmengine - INFO - 

10/31 17:15:09 - mmengine - INFO - **********Backend information**********
10/31 17:15:09 - mmengine - INFO - tensorrt:    8.4.3.1
10/31 17:15:09 - mmengine - INFO - tensorrt custom ops: Available
10/31 17:15:09 - mmengine - INFO - ONNXRuntime: 1.15.1
10/31 17:15:09 - mmengine - INFO - ONNXRuntime-gpu:     None
10/31 17:15:09 - mmengine - INFO - ONNXRuntime custom ops:      Available
10/31 17:15:09 - mmengine - INFO - pplnn:       None
10/31 17:15:09 - mmengine - INFO - ncnn:        None
10/31 17:15:09 - mmengine - INFO - snpe:        None
10/31 17:15:09 - mmengine - INFO - openvino:    None
10/31 17:15:09 - mmengine - INFO - torchscript: 1.13.1+cu116
10/31 17:15:09 - mmengine - INFO - torchscript custom ops:      NotAvailable
10/31 17:15:09 - mmengine - INFO - rknn-toolkit:        None
10/31 17:15:09 - mmengine - INFO - rknn-toolkit2:       None
10/31 17:15:09 - mmengine - INFO - ascend:      None
10/31 17:15:09 - mmengine - INFO - coreml:      None
10/31 17:15:09 - mmengine - INFO - tvm: None
10/31 17:15:09 - mmengine - INFO - vacc:        None
10/31 17:15:09 - mmengine - INFO - 

10/31 17:15:09 - mmengine - INFO - **********Codebase information**********
10/31 17:15:09 - mmengine - INFO - mmdet:       3.2.0
10/31 17:15:09 - mmengine - INFO - mmseg:       None
10/31 17:15:09 - mmengine - INFO - mmpretrain:  None
10/31 17:15:09 - mmengine - INFO - mmocr:       None
10/31 17:15:09 - mmengine - INFO - mmagic:      None
10/31 17:15:09 - mmengine - INFO - mmdet3d:     None
10/31 17:15:09 - mmengine - INFO - mmpose:      None
10/31 17:15:09 - mmengine - INFO - mmrotate:    None
10/31 17:15:09 - mmengine - INFO - mmaction:    None
10/31 17:15:09 - mmengine - INFO - mmrazor:     None
10/31 17:15:09 - mmengine - INFO - mmyolo:      None

root@99fa767e7990:/# env
NV_LIBCUBLAS_VERSION=11.8.1.74-1
NVIDIA_VISIBLE_DEVICES=all
NV_NVML_DEV_VERSION=11.6.55-1
NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.11.4-1+cuda11.6
NV_LIBNCCL_DEV_PACKAGE_VERSION=2.11.4-1
HOSTNAME=99fa767e7990
NVIDIA_REQUIRE_CUDA=cuda>=11.6 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=440,driver<441 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=460,driver<461 brand=tesla,driver>=470,driver<471
NV_LIBCUBLAS_DEV_PACKAGE=libcublas-dev-11-6=11.8.1.74-1
NV_NVTX_VERSION=11.6.55-1
NV_CUDA_CUDART_DEV_VERSION=11.6.55-1
NV_LIBCUSPARSE_VERSION=11.7.1.55-1
NV_LIBNPP_VERSION=11.6.0.55-1
NCCL_VERSION=2.11.4-1
PWD=/
NVIDIA_DRIVER_CAPABILITIES=compute,utility
NV_LIBNPP_PACKAGE=libnpp-11-6=11.6.0.55-1
NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
NV_LIBCUBLAS_DEV_VERSION=11.8.1.74-1
NV_LIBCUBLAS_DEV_PACKAGE_NAME=libcublas-dev-11-6
VIRTUALENVWRAPPER_SCRIPT=/root/.local/bin/virtualenvwrapper.sh
NV_CUDA_CUDART_VERSION=11.6.55-1
HOME=/root
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
ONNXRUNTIME_DIR=/my_proj/library/onnxruntime-linux-x64-1.8.1
CUDA_VERSION=11.6.0
NV_LIBCUBLAS_PACKAGE=libcublas-11-6=11.8.1.74-1
VIRTUALENVWRAPPER_WORKON_CD=1
VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3.8
NV_LIBNPP_DEV_PACKAGE=libnpp-dev-11-6=11.6.0.55-1
NV_LIBCUBLAS_PACKAGE_NAME=libcublas-11-6
NV_LIBNPP_DEV_VERSION=11.6.0.55-1
TENSORRT_DIR=/my_proj/library/TensorRT-8.6.1.6
WORKON_HOME=/root/.virtualenvs
LESSCLOSE=/usr/bin/lesspipe %s %s
TERM=xterm
NV_LIBCUSPARSE_DEV_VERSION=11.7.1.55-1
LESSOPEN=| /usr/bin/lesspipe %s
LIBRARY_PATH=/usr/local/cuda/lib64/stubs
VIRTUALENVWRAPPER_PROJECT_FILENAME=.project
SHLVL=1
NV_CUDA_LIB_VERSION=11.6.0-1
NVARCH=x86_64
NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-6
NV_LIBNCCL_PACKAGE=libnccl2=2.11.4-1+cuda11.6
LD_LIBRARY_PATH=/my_proj/library/TensorRT-8.6.1.6/lib:/my_proj/library/onnxruntime-linux-x64-1.8.1/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
VIRTUALENVWRAPPER_HOOK_DIR=/root/.virtualenvs
NV_LIBNCCL_PACKAGE_NAME=libnccl2
NV_LIBNCCL_PACKAGE_VERSION=2.11.4-1
_=/usr/bin/env

open-mmlab / mmdeploy