open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.73k stars 627 forks source link

[Bug] Instance segmentation model conversion onnx error #2121

Closed magic-hya closed 1 year ago

magic-hya commented 1 year ago

Checklist

Describe the bug

Instance segmentation model conversion onnx error.

Reproduction

from mmdeploy.apis import torch2onnx from mmdeploy.backend.sdk.export_info import export2SDK img = '/data/mmdet/mmdetection/demo/demo.jpg' work_dir = '/data/mmdet/onnx' save_file = 'end2end.onnx' deploy_cfg = '/root/workspace/mmdeploy/configs/mmdet/instance-seg/instance-seg_sdk_dynamic.py' model_cfg = '/data/mmdet/scnet_r50_fpn_20e_coco.py' model_checkpoint = '/data/mmdet/scnet_r50_fpn_20e_coco-a569f645.pth' device = 'cuda:0'

1. convert model to onnx

torch2onnx(img, work_dir, save_file, deploy_cfg, model_cfg, model_checkpoint, device)

Environment

05/26 09:47:37 - mmengine - INFO -

05/26 09:47:37 - mmengine - INFO - **********Environmental information**********
05/26 09:47:38 - mmengine - INFO - sys.platform: linux
05/26 09:47:38 - mmengine - INFO - Python: 3.8.16 (default, Mar  2 2023, 03:21:46) [GCC 11.2.0]
05/26 09:47:38 - mmengine - INFO - CUDA available: True
05/26 09:47:38 - mmengine - INFO - numpy_random_seed: 2147483648
05/26 09:47:38 - mmengine - INFO - GPU 0: Tesla T4
05/26 09:47:38 - mmengine - INFO - CUDA_HOME: /usr/local/cuda
05/26 09:47:38 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.6, V11.6.124
05/26 09:47:38 - mmengine - INFO - GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
05/26 09:47:38 - mmengine - INFO - PyTorch: 1.10.0
05/26 09:47:38 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

05/26 09:47:38 - mmengine - INFO - TorchVision: 0.11.0
05/26 09:47:38 - mmengine - INFO - OpenCV: 4.7.0
05/26 09:47:38 - mmengine - INFO - MMEngine: 0.7.3
05/26 09:47:38 - mmengine - INFO - MMCV: 2.0.0
05/26 09:47:38 - mmengine - INFO - MMCV Compiler: GCC 9.3
05/26 09:47:38 - mmengine - INFO - MMCV CUDA Compiler: 11.3
05/26 09:47:38 - mmengine - INFO - MMDeploy: 1.1.0+faf05fe
05/26 09:47:38 - mmengine - INFO -

05/26 09:47:38 - mmengine - INFO - **********Backend information**********
05/26 09:47:38 - mmengine - INFO - tensorrt:    8.2.4.2
05/26 09:47:38 - mmengine - INFO - tensorrt custom ops: Available
05/26 09:47:38 - mmengine - INFO - ONNXRuntime: None
05/26 09:47:38 - mmengine - INFO - ONNXRuntime-gpu:     1.8.1
05/26 09:47:38 - mmengine - INFO - ONNXRuntime custom ops:      Available
05/26 09:47:38 - mmengine - INFO - pplnn:       None
05/26 09:47:38 - mmengine - INFO - ncnn:        None
05/26 09:47:38 - mmengine - INFO - snpe:        None
05/26 09:47:38 - mmengine - INFO - openvino:    None
05/26 09:47:38 - mmengine - INFO - torchscript: 1.10.0
05/26 09:47:38 - mmengine - INFO - torchscript custom ops:      NotAvailable
05/26 09:47:38 - mmengine - INFO - rknn-toolkit:        None
05/26 09:47:38 - mmengine - INFO - rknn-toolkit2:       None
05/26 09:47:38 - mmengine - INFO - ascend:      None
05/26 09:47:38 - mmengine - INFO - coreml:      None
05/26 09:47:38 - mmengine - INFO - tvm: None
05/26 09:47:38 - mmengine - INFO - vacc:        None
05/26 09:47:38 - mmengine - INFO -

05/26 09:47:38 - mmengine - INFO - **********Codebase information**********
05/26 09:47:38 - mmengine - INFO - mmdet:       3.0.0
05/26 09:47:38 - mmengine - INFO - mmseg:       None
05/26 09:47:38 - mmengine - INFO - mmpretrain:  None
05/26 09:47:38 - mmengine - INFO - mmocr:       None
05/26 09:47:38 - mmengine - INFO - mmagic:      None
05/26 09:47:38 - mmengine - INFO - mmdet3d:     None
05/26 09:47:38 - mmengine - INFO - mmpose:      None
05/26 09:47:38 - mmengine - INFO - mmrotate:    None
05/26 09:47:38 - mmengine - INFO - mmaction:    None
05/26 09:47:38 - mmengine - INFO - mmrazor:     None

Error traceback

File /data/mmdet/mmdetection/mmdet/models/roi_heads/scnet_roi_head.py:517, in SCNetRoIHead.predict(self, x, rpn_results_list, batch_data_samples, rescale)
    510 # TODO: nms_op in mmcv need be enhanced, the bbox result may get
    511 #  difference when not rescale in bbox_head
    512 
    513 # If it has the mask branch, the bbox branch does not need
    514 # to be scaled to the original image scale, because the mask
    515 # branch will scale both bbox and mask at the same time.
    516 bbox_rescale = rescale if not self.with_mask else False
--> 517 results_list = self.predict_bbox(
    518     x=x,
    519     semantic_feat=semantic_feat,
    520     glbctx_feat=glbctx_feat,
    521     batch_img_metas=batch_img_metas,
    522     rpn_results_list=rpn_results_list,
    523     rcnn_test_cfg=self.test_cfg,
    524     rescale=bbox_rescale)
    526 if self.with_mask:
    527     results_list = self.predict_mask(
    528         x=x,
    529         semantic_heat=semantic_feat,
   (...)
    532         results_list=results_list,
    533         rescale=rescale)

TypeError: cascade_roi_head__predict_bbox() got an unexpected keyword argument 'semantic_feat'
RunningLeon commented 1 year ago

@magic-hya hi, scnet is not supported according to this: https://mmdeploy.readthedocs.io/en/latest/04-supported-codebases/mmdet.html#supported-models

magic-hya commented 1 year ago

@RunningLeon hi,After selecting the supported model, the conversion was successful.But when reasoning, the kernel crashes directly, and I have tried several models, all of which are the same problem.

code: from mmdeploy_runtime import Detector img_path = '/data/mmdet/mmdetection/demo/demo.jpg' img = cv2.imread(img_path) detector = Detector(model_path='/data/mmdet/onnx-faster-rcnn', device_name='cuda', device_id=0)

log: [2023-05-30 06:42:37.520] [mmdeploy] [info] [model.cpp:35] [DirectoryModel] Load model: "/data/mmdet/onnx-faster-rcnn" [2023-05-30 06:42:37.759] [mmdeploy] [error] [ort_net.cpp:205] unhandled exception when creating ORTNet: OrtSessionOptionsAppendExecutionProvider_Cuda: Failed to load shared library [2023-05-30 06:42:37.759] [mmdeploy] [error] [net_module.cpp:54] Failed to create Net backend: onnxruntime, config: { ...

2023-05-30 06:42:37.741984155 [E:onnxruntime:, provider_bridge_ort.cc:901 Ensure] Failed to load library libonnxruntime_providers_shared.so with error: libonnxruntime_providers_shared.so: cannot open shared object file: No such file or directory

RunningLeon commented 1 year ago

@magic-hya hi, you have to prepare onnxruntime lib with gpu version when building mmdeploy by the doc. For example, you should download onnxruntime-linux-x64-gpu-1.10.0.tgz

github-actions[bot] commented 1 year ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.