open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.71k stars 620 forks source link

[Bug] win10使用cuda + TensorRT 部署后,运行推理报错 #1312

Closed kc-w closed 1 year ago

kc-w commented 1 year ago

Checklist

Describe the bug

我想使用tensorrt进行后端推理

报下列错: loading mmdeploy_execution ... loading mmdeploy_cpu_device ... loading mmdeploy_cuda_device ... loading mmdeploy_graph ... loading mmdeploy_directory_model ... [2022-11-07 15:14:31.143] [mmdeploy] [info] [model.cpp:98] Register 'DirectoryModel' loading mmdeploy_transform ... loading mmdeploy_cpu_transform_impl ... loading mmdeploy_cuda_transform_impl ... loading mmdeploy_transform_module ... loading mmdeploy_trt_net ... loading mmdeploy_net_module ... loading mmdeploy_mmcls ... loading mmdeploy_mmdet ... loading mmdeploy_mmseg ... loading mmdeploy_mmocr ... loading mmdeploy_mmedit ... loading mmdeploy_mmpose ... loading mmdeploy_mmrotate ... loading mmdeploy_mmaction ... [2022-11-07 15:14:31.287] [mmdeploy] [error] [model.cpp:45] no ModelImpl can read model E:\projectTest\mmdeploy\tools\end2end.engine [2022-11-07 15:14:31.287] [mmdeploy] [error] [model.cpp:15] load model failed. Its file path is 'E:\projectTest\mmdeploy\tools\end2end.engine' [2022-11-07 15:14:31.290] [mmdeploy] [error] [model.cpp:21] failed to create model: not supported (2) failed to create segmentor, code: 6

E:\projectTest\mmdeploy\build_tensorrt\bin\Release\image_segmentation.exe (进程 173116)已退出,代码为 1。 要在调试停止时自动关闭控制台,请启用“工具”->“选项”->“调试”->“调试停止时自动关闭控制台”。 按任意键关闭此窗口. . .

Reproduction

使用以下命令编译推理demo: cmake .. -G "Visual Studio 16 2019" -A x64 -T v142 -DMMDEPLOY_BUILD_SDK=ON -DMMDEPLOY_BUILD_EXAMPLES=ON -DMMDEPLOY_BUILD_SDK_PYTHON_API=ON -DMMDEPLOY_TARGET_DEVICES="cuda" -DMMDEPLOY_TARGET_BACKENDS="trt" -Dpplcv_DIR="E:\projectTest\ppl.cv\pplcv-build\install\lib\cmake\ppl" -DTENSORRT_DIR="D:\TensorRT-8.4.1.5" -DCUDNN_DIR="D:\cudnn-8.4.1.50\lib"

运行推理程序输入的命令: image_segmentation.exe cuda E:\projectTest\mmdeploy\tools\end2end.engine D:\images\mask\img_dir\test\Image_20220626155531345.jpg

Environment

2022-11-02 15:25:04,337 - mmdeploy - INFO - 

2022-11-02 15:25:04,337 - mmdeploy - INFO - **********Environmental information**********
2022-11-02 15:25:12,131 - mmdeploy - INFO - sys.platform: win32
2022-11-02 15:25:12,131 - mmdeploy - INFO - Python: 3.9.12 (tags/v3.9.12:b28265d, Mar 23 2022, 23:52:46) [MSC v.1929 64 bit (AMD64)]
2022-11-02 15:25:12,131 - mmdeploy - INFO - CUDA available: True
2022-11-02 15:25:12,131 - mmdeploy - INFO - GPU 0: NVIDIA GeForce RTX 3060
2022-11-02 15:25:12,131 - mmdeploy - INFO - CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3
2022-11-02 15:25:12,131 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 11.3, V11.3.109
2022-11-02 15:25:12,131 - mmdeploy - INFO - MSVC: 用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.29.30146 版
2022-11-02 15:25:12,131 - mmdeploy - INFO - GCC: n/a
2022-11-02 15:25:12,131 - mmdeploy - INFO - PyTorch: 1.11.0+cu113
2022-11-02 15:25:12,131 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
  - C++ Version: 199711
  - MSVC 192829337
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
  - OpenMP 2019
  - LAPACK is enabled (usually provided by MKL)
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.4
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/actions-runner/_work/pytorch/pytorch/builder/windows/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, 

2022-11-02 15:25:12,131 - mmdeploy - INFO - TorchVision: 0.12.0+cu113
2022-11-02 15:25:12,131 - mmdeploy - INFO - OpenCV: 4.5.4
2022-11-02 15:25:12,131 - mmdeploy - INFO - MMCV: 1.6.2
2022-11-02 15:25:12,131 - mmdeploy - INFO - MMCV Compiler: MSVC 192930146
2022-11-02 15:25:12,131 - mmdeploy - INFO - MMCV CUDA Compiler: 11.3
2022-11-02 15:25:12,131 - mmdeploy - INFO - MMDeploy: 0.10.0+c4d428f
2022-11-02 15:25:12,132 - mmdeploy - INFO - 

2022-11-02 15:25:12,132 - mmdeploy - INFO - **********Backend information**********
2022-11-02 15:25:12,689 - mmdeploy - INFO - onnxruntime: 1.13.1 ops_is_avaliable : False
2022-11-02 15:25:12,726 - mmdeploy - INFO - tensorrt: 8.4.1.5   ops_is_avaliable : False
2022-11-02 15:25:12,850 - mmdeploy - INFO - ncnn: None  ops_is_avaliable : False
2022-11-02 15:25:12,854 - mmdeploy - INFO - pplnn_is_avaliable: False
2022-11-02 15:25:12,944 - mmdeploy - INFO - openvino_is_avaliable: True
2022-11-02 15:25:13,000 - mmdeploy - INFO - snpe_is_available: False
2022-11-02 15:25:13,005 - mmdeploy - INFO - ascend_is_available: False
2022-11-02 15:25:13,029 - mmdeploy - INFO - coreml_is_available: False
2022-11-02 15:25:13,029 - mmdeploy - INFO - 

2022-11-02 15:25:13,029 - mmdeploy - INFO - **********Codebase information**********
2022-11-02 15:25:13,108 - mmdeploy - INFO - mmdet:  None
2022-11-02 15:25:13,108 - mmdeploy - INFO - mmseg:  0.29.0
2022-11-02 15:25:13,108 - mmdeploy - INFO - mmcls:  None
2022-11-02 15:25:13,108 - mmdeploy - INFO - mmocr:  None
2022-11-02 15:25:13,108 - mmdeploy - INFO - mmedit: 0.16.0
2022-11-02 15:25:13,108 - mmdeploy - INFO - mmdet3d:    None
2022-11-02 15:25:13,108 - mmdeploy - INFO - mmpose: None
2022-11-02 15:25:13,108 - mmdeploy - INFO - mmrotate:   None

Error traceback

No response

mm-assistant[bot] commented 1 year ago

We recommend using English or English & Chinese for issues so that we could have broader discussion.

irexyc commented 1 year ago

Use model dir instead of model itself. So, you should pass E:\projectTest\mmdeploy\tools instead

kc-w commented 1 year ago

I create a new directory and move the engine model into it.

use command: image_segmentation.exe cuda E:\projectTest\mmdeploy\tools\engine D:\images\mask\img_dir\test\Image_20220626155531345.jpg

error message: loading mmdeploy_execution ... loading mmdeploy_cpu_device ... loading mmdeploy_cuda_device ... loading mmdeploy_graph ... loading mmdeploy_directory_model ... [2022-11-07 15:32:23.985] [mmdeploy] [info] [model.cpp:98] Register 'DirectoryModel' loading mmdeploy_transform ... loading mmdeploy_cpu_transform_impl ... loading mmdeploy_cuda_transform_impl ... loading mmdeploy_transform_module ... loading mmdeploy_trt_net ... loading mmdeploy_net_module ... loading mmdeploy_mmcls ... loading mmdeploy_mmdet ... loading mmdeploy_mmseg ... loading mmdeploy_mmocr ... loading mmdeploy_mmedit ... loading mmdeploy_mmpose ... loading mmdeploy_mmrotate ... loading mmdeploy_mmaction ... [2022-11-07 15:32:24.125] [mmdeploy] [error] [model.cpp:15] load model failed. Its file path is 'E:\projectTest\mmdeploy\tools\engine' [2022-11-07 15:32:24.127] [mmdeploy] [error] [model.cpp:21] failed to create model: unknown (6) failed to create segmentor, code: 6

E:\projectTest\mmdeploy\build_tensorrt\bin\Release\image_segmentation.exe (进程 176060)已退出,代码为 1。 要在调试停止时自动关闭控制台,请启用“工具”->“选项”->“调试”->“调试停止时自动关闭控制台”。 按任意键关闭此窗口. .

irexyc commented 1 year ago

If you want to inference model with sdk, you should add --dump-info when convert the model. Please provide your convert command

kc-w commented 1 year ago
parser = argparse.ArgumentParser(description='Export model to backends.')
parser.add_argument('--deploy_cfg',
                    default='E:/projectTest/mmdeploy/configs/mmseg/segmentation_tensorrt_dynamic-512x1024-2048x2048.py',
                    help='deploy config path')
parser.add_argument('--model_cfg', default='E:/projectTest/mmsegmentation/configs/pspnet/MyPsp.py',
                    help='model config path')
parser.add_argument('--checkpoint', default='E:/projectTest/mmsegmentation/result/iter_200.pth',
                    help='model checkpoint path')
parser.add_argument('--img', default='D:/images/mask/img_dir/train/Image_20220626151509704.jpg',
                    help='image used to convert model model')
parser.add_argument(
    '--test-img', default=None, help='image used to test model')
parser.add_argument(
    '--work-dir',
    default=os.getcwd(),
    help='the dir to save logs and models')
parser.add_argument(
    '--calib-dataset-cfg',
    help='dataset config path used to calibrate in int8 mode. If not \
        specified, it will use "val" dataset in model config instead.',
    default=None)
parser.add_argument(
    '--device', help='device used for conversion', default='cuda')
parser.add_argument(
    '--log-level',
    help='set log level',
    default='INFO',
    choices=list(logging._nameToLevel.keys()))
parser.add_argument(
    '--show', action='store_true', help='Show detection outputs')
parser.add_argument(
    '--dump-info', action='store_true', help='Output information for SDK')
parser.add_argument(
    '--quant-image-dir',
    default=None,
    help='Image directory for quantize model.')
parser.add_argument(
    '--quant', action='store_true', help='Quantize model to low bit.')
parser.add_argument(
    '--uri',
    default='192.168.1.1:60000',
    help='Remote ipv4:port or ipv6:port for inference on edge device.')
args = parser.parse_args()
irexyc commented 1 year ago

According to you modification of deploy.py. You should add default value of --dump-info to True.

parser.add_argument(
    '--dump-info', action='store_true', default=True, help='Output information for SDK')

After convert the model, the structure of model dir should be

-- end2end.onnx
-- end2end.engine
-- deploy.json
-- deploy.json
-- pipeline.json
kc-w commented 1 year ago

There is no problem,thanks