open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.74k stars 627 forks source link

[Bug] 模型部署中遇到奇怪的问题 #2436

Closed Jianfeng777 closed 1 year ago

Jianfeng777 commented 1 year ago

Checklist

Describe the bug

在模型转换中遇到的 `(mmdeploy) jianfeng@Administrator:/mnt/e/AI/mmdeploy$ python tools/deploy.py /mnt/e/AI/mmdeploy/configs/mmdet/detection/detection_onnxruntime_dynamic.py /mnt/e/AI/mmdeploy/project/test/rtmdet_tiny_8xb32-300e_coco.py /mnt/e/AI/mmdeploy/project/test/rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth /mnt/e/AI/mmdeploy/project/test/demo.jpg --work-dir output/test --device cuda --dump-info 09/14 10:07:59 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. 09/14 10:07:59 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. 09/14 10:08:01 - mmengine - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess 09/14 10:08:01 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. 09/14 10:08:01 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. Loads checkpoint by local backend from path: /mnt/e/AI/mmdeploy/project/test/rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth The model and loaded state dict do not match exactly

unexpected key in source state_dict: data_preprocessor.mean, data_preprocessor.std

09/14 10:08:03 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 09/14 10:08:03 - mmengine - INFO - Export PyTorch model to ONNX: output/test/end2end.onnx. /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/core/optimizers/function_marker.py:160: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! ys_shape = tuple(int(s) for s in ys.shape) /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/mmcv/ops/nms.py:285: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. iou_threshold = torch.tensor([iou_threshold], dtype=torch.float32) /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/mmcv/ops/nms.py:286: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. score_threshold = torch.tensor([score_threshold], dtype=torch.float32) /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/pytorch/functions/topk.py:28: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. k = torch.tensor(k, device=input.device, dtype=torch.long) /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/mmcv/ops/nms.py:44: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! score_threshold = float(score_threshold) /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/mmcv/ops/nms.py:45: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! iou_threshold = float(iou_threshold) /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmcv/ops/nms.py:123: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert boxes.size(1) == 4 /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmcv/ops/nms.py:124: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert boxes.size(0) == scores.size(0) /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/onnx/symbolic_opset9.py:5589: UserWarning: Exporting aten::index operator of advanced indexing in opset 11 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results. warnings.warn( 09/14 10:08:06 - mmengine - INFO - Execute onnx optimize passes. 09/14 10:08:06 - mmengine - WARNING - Can not optimize model, please build torchscipt extension. More details: https://github.com/open-mmlab/mmdeploy/tree/main/docs/en/experimental/onnx_optimizer.md ============= Diagnostic Run torch.onnx.export version 2.0.1+cu118 ============= verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

09/14 10:08:06 - mmengine - INFO - Finish pipeline mmdeploy.apis.pytorch2onnx.torch2onnx 09/14 10:08:07 - mmengine - INFO - Start pipeline mmdeploy.apis.utils.utils.to_backend in main process 09/14 10:08:07 - mmengine - INFO - Finish pipeline mmdeploy.apis.utils.utils.to_backend 09/14 10:08:07 - mmengine - INFO - visualize onnxruntime model start. 09/14 10:08:09 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. 09/14 10:08:09 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. 09/14 10:08:09 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "backend_detectors" registry tree. As a workaround, the current "backend_detectors" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. 09/14 10:08:09 - mmengine - INFO - Successfully loaded onnxruntime custom ops from /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/lib/libmmdeploy_onnxruntime_ops.so /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:65: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'CPUExecutionProvider' warnings.warn( 2023-09-14:10:08:10 - root - ERROR - Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0] Traceback (most recent call last): File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/utils/utils.py", line 41, in target_wrapper result = target(args, kwargs) File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/apis/visualize.py", line 72, in visualize_model result = model.test_step(model_inputs)[0] File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 145, in test_step return self._run_forward(data, mode='predict') # type: ignore File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 340, in _run_forward results = self(data, mode=mode) File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, *kwargs) File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/deploy/object_detection_model.py", line 192, in forward outputs = self.predict(inputs) File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/deploy/object_detection_model.py", line 292, in predict outputs = self.wrapper({self.input_name: imgs}) File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, **kwargs) File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/backend/onnxruntime/wrapper.py", line 84, in forward self.io_binding.bind_input( File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 496, in bind_input self._iobinding.bind_input( RuntimeError: Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0] 09/14 10:08:10 - mmengine - ERROR - tools/deploy.py - create_process - 82 - visualize onnxruntime model failed.`

Reproduction

就是tools里的deploy.py

Environment

(mmdeploy) jianfeng@Administrator:/mnt/e/AI/mmdeploy$ python tools/check_env.py
09/14 10:18:05 - mmengine - INFO - 

09/14 10:18:05 - mmengine - INFO - **********Environmental information**********
/bin/sh: 1: /usr/local/cuda-11.8/bin/nvcc: not found
/bin/sh: 1: /usr/local/cuda-11.8/bin/nvcc: not found
09/14 10:18:06 - mmengine - INFO - sys.platform: linux
09/14 10:18:06 - mmengine - INFO - Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]
09/14 10:18:06 - mmengine - INFO - CUDA available: True
09/14 10:18:06 - mmengine - INFO - numpy_random_seed: 2147483648
09/14 10:18:06 - mmengine - INFO - GPU 0: NVIDIA GeForce RTX 3070 Ti
09/14 10:18:06 - mmengine - INFO - CUDA_HOME: /usr/local/cuda-11.8
09/14 10:18:06 - mmengine - INFO - NVCC: Not Available
09/14 10:18:06 - mmengine - INFO - GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
09/14 10:18:06 - mmengine - INFO - PyTorch: 2.0.1+cu118
09/14 10:18:06 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.7
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

09/14 10:18:06 - mmengine - INFO - TorchVision: 0.15.2+cu118
09/14 10:18:06 - mmengine - INFO - OpenCV: 4.8.0
09/14 10:18:06 - mmengine - INFO - MMEngine: 0.8.4
09/14 10:18:06 - mmengine - INFO - MMCV: 2.0.1
09/14 10:18:06 - mmengine - INFO - MMCV Compiler: GCC 9.3
09/14 10:18:06 - mmengine - INFO - MMCV CUDA Compiler: 11.8
09/14 10:18:06 - mmengine - INFO - MMDeploy: 1.2.0+8478249
09/14 10:18:06 - mmengine - INFO - 

09/14 10:18:06 - mmengine - INFO - **********Backend information**********
09/14 10:18:06 - mmengine - INFO - tensorrt:    None
09/14 10:18:06 - mmengine - INFO - ONNXRuntime: 1.15.1
09/14 10:18:06 - mmengine - INFO - ONNXRuntime-gpu:     None
09/14 10:18:06 - mmengine - INFO - ONNXRuntime custom ops:      Available
09/14 10:18:06 - mmengine - INFO - pplnn:       None
09/14 10:18:06 - mmengine - INFO - ncnn:        None
09/14 10:18:07 - mmengine - INFO - snpe:        None
09/14 10:18:07 - mmengine - INFO - openvino:    None
09/14 10:18:07 - mmengine - INFO - torchscript: 2.0.1+cu118
09/14 10:18:07 - mmengine - INFO - torchscript custom ops:      NotAvailable
09/14 10:18:07 - mmengine - INFO - rknn-toolkit:        None
09/14 10:18:07 - mmengine - INFO - rknn-toolkit2:       None
09/14 10:18:07 - mmengine - INFO - ascend:      None
09/14 10:18:07 - mmengine - INFO - coreml:      None
09/14 10:18:07 - mmengine - INFO - tvm: None
09/14 10:18:07 - mmengine - INFO - vacc:        None
09/14 10:18:07 - mmengine - INFO - 

09/14 10:18:07 - mmengine - INFO - **********Codebase information**********
09/14 10:18:07 - mmengine - INFO - mmdet:       3.1.0
09/14 10:18:07 - mmengine - INFO - mmseg:       None
09/14 10:18:07 - mmengine - INFO - mmpretrain:  None
09/14 10:18:07 - mmengine - INFO - mmocr:       None
09/14 10:18:07 - mmengine - INFO - mmagic:      None
09/14 10:18:07 - mmengine - INFO - mmdet3d:     None
09/14 10:18:07 - mmengine - INFO - mmpose:      None
09/14 10:18:07 - mmengine - INFO - mmrotate:    None
09/14 10:18:07 - mmengine - INFO - mmaction:    None
09/14 10:18:07 - mmengine - INFO - mmrazor:     None
09/14 10:18:07 - mmengine - INFO - mmyolo:      None

Error traceback

RuntimeError: Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]
09/14 10:08:10 - mmengine - ERROR - tools/deploy.py - create_process - 82 - visualize onnxruntime model failed.
Jianfeng777 commented 1 year ago

虽然成功转换了一个onnx出来,但是用它来推理我的照片也是没有结果的,显示如下。 (mmdeploy-gpu) jianfeng@Administrator:/mnt/e/AI/mmdeploy$ python /mnt/e/AI/mmdeploy/demo/python/object_detection.py cuda /mnt/e/AI/mmdeploy/output/test /mnt/e/AI/mmdeploy/project/test/demo.jpg loading libmmdeploy_trt_net.so ... failed to load library libmmdeploy_trt_net.so loading libmmdeploy_ort_net.so ... [2023-09-14 10:12:53.911] [mmdeploy] [info] [model.cpp:35] [DirectoryModel] Load model: "/mnt/e/AI/mmdeploy/output/test" Total time to process the directory: 0.0025 seconds

irexyc commented 1 year ago
  1. 转模型报错的原因是:
09/14 10:08:06 - mmengine - INFO - Finish pipeline mmdeploy.apis.pytorch2onnx.torch2onnx
09/14 10:08:07 - mmengine - INFO - Start pipeline mmdeploy.apis.utils.utils.to_backend in main process
09/14 10:08:07 - mmengine - INFO - Finish pipeline mmdeploy.apis.utils.utils.to_backend
09/14 10:08:07 - mmengine - INFO - visualize onnxruntime model start.

...
2023-09-14:10:08:10 - root - ERROR - Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]

转模型的流程是 pytorch -> backend model -> visualize。从log可以看出,转模型是成功,出错的是可视化。原因应该是你安装的是cpu版本 onnxruntime python包,但是你指定的是cuda,所以 onnxruntime这里有问题。SDK没报错是因为SDK加载的是你之前下载的onnxruntime-gpu/lib,跟python whl包是两个东西。

  1. SDK没结果

建议你转模型的时候,指定device为cpu,或者pip install onnxruntime-gpu,这样就可以可视化成功。会生成两张图片,一个pytorch的结果,一个onnxruntime的结果,确保这两个图片长的差不多。sdk的话不会打印结果,会保存一个图片

Jianfeng777 commented 1 year ago
  1. 转模型报错的原因是:
09/14 10:08:06 - mmengine - INFO - Finish pipeline mmdeploy.apis.pytorch2onnx.torch2onnx
09/14 10:08:07 - mmengine - INFO - Start pipeline mmdeploy.apis.utils.utils.to_backend in main process
09/14 10:08:07 - mmengine - INFO - Finish pipeline mmdeploy.apis.utils.utils.to_backend
09/14 10:08:07 - mmengine - INFO - visualize onnxruntime model start.

...
2023-09-14:10:08:10 - root - ERROR - Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]

转模型的流程是 pytorch -> backend model -> visualize。从log可以看出,转模型是成功,出错的是可视化。原因应该是你安装的是cpu版本 onnxruntime python包,但是你指定的是cuda,所以 onnxruntime这里有问题。SDK没报错是因为SDK加载的是你之前下载的onnxruntime-gpu/lib,跟python whl包是两个东西。

  1. SDK没结果

建议你转模型的时候,指定device为cpu,或者pip install onnxruntime-gpu,这样就可以可视化成功。会生成两张图片,一个pytorch的结果,一个onnxruntime的结果,确保这两个图片长的差不多。sdk的话不会打印结果,会保存一个图片

但是我换成CPU之后还是会出现这个问题。

Jianfeng777 commented 1 year ago

(mmdeploy) jianfeng@Administrator:/mnt/e/AI/mmdeploy$ python tools/deploy.py /mnt/e/AI/mmdeploy/configs/mmdet/detection/detection_onnxruntime_dynamic.py /mnt/e/AI/mmdeploy/project/test/rtmdet_tiny_8xb32-300e_coco.py /mnt/e/AI/mmdeploy/project/test/rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth /mnt/e/AI/mmdeploy/project/test/demo.jpg --work-dir /mnt/e/AI/mmdeploy/output/test --device cpu --dump-info 09/14 11:18:36 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. 09/14 11:18:36 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. 09/14 11:18:38 - mmengine - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess 09/14 11:18:38 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. 09/14 11:18:38 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. Loads checkpoint by local backend from path: /mnt/e/AI/mmdeploy/project/test/rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth The model and loaded state dict do not match exactly

unexpected key in source state_dict: data_preprocessor.mean, data_preprocessor.std

09/14 11:18:39 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 09/14 11:18:39 - mmengine - INFO - Export PyTorch model to ONNX: /mnt/e/AI/mmdeploy/output/test/end2end.onnx. /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/core/optimizers/function_marker.py:160: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! ys_shape = tuple(int(s) for s in ys.shape) /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/mmcv/ops/nms.py:285: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. iou_threshold = torch.tensor([iou_threshold], dtype=torch.float32) /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/mmcv/ops/nms.py:286: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. score_threshold = torch.tensor([score_threshold], dtype=torch.float32) /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/pytorch/functions/topk.py:28: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. k = torch.tensor(k, device=input.device, dtype=torch.long) /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/mmcv/ops/nms.py:44: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! score_threshold = float(score_threshold) /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/mmcv/ops/nms.py:45: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! iou_threshold = float(iou_threshold) /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmcv/ops/nms.py:123: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert boxes.size(1) == 4 /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmcv/ops/nms.py:124: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert boxes.size(0) == scores.size(0) /home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/onnx/symbolic_opset9.py:5589: UserWarning: Exporting aten::index operator of advanced indexing in opset 11 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results. warnings.warn( 09/14 11:18:41 - mmengine - INFO - Execute onnx optimize passes. 09/14 11:18:41 - mmengine - WARNING - Can not optimize model, please build torchscipt extension. More details: https://github.com/open-mmlab/mmdeploy/tree/main/docs/en/experimental/onnx_optimizer.md ============= Diagnostic Run torch.onnx.export version 2.0.1+cu118 ============= verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

09/14 11:18:41 - mmengine - INFO - Finish pipeline mmdeploy.apis.pytorch2onnx.torch2onnx 09/14 11:18:41 - mmengine - INFO - Start pipeline mmdeploy.apis.utils.utils.to_backend in main process 09/14 11:18:41 - mmengine - INFO - Finish pipeline mmdeploy.apis.utils.utils.to_backend 09/14 11:18:41 - mmengine - INFO - visualize onnxruntime model start. 09/14 11:18:43 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. 09/14 11:18:43 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. 09/14 11:18:44 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "backend_detectors" registry tree. As a workaround, the current "backend_detectors" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized. 2023-09-14:11:18:44 - root - ERROR - libcurand.so.10: cannot open shared object file: No such file or directory Traceback (most recent call last): File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/utils/utils.py", line 41, in target_wrapper result = target(*args, *kwargs) File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/apis/visualize.py", line 65, in visualize_model model = task_processor.build_backend_model( File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/deploy/object_detection.py", line 157, in build_backend_model model = build_object_detection_model( File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/deploy/object_detection_model.py", line 939, in build_object_detection_model backend_detector = __BACKEND_MODEL.build( File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, args, kwargs, registry=self) File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(args) # type: ignore File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/deploy/object_detection_model.py", line 55, in init self._init_wrapper( File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/deploy/object_detection_model.py", line 69, in _init_wrapper self.wrapper = BaseBackendModel._build_wrapper( File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/codebase/base/backend_model.py", line 65, in _build_wrapper return backend_mgr.build_wrapper(backend_files, device, input_names, File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/backend/onnxruntime/backend_manager.py", line 33, in build_wrapper from .wrapper import ORTWrapper File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/backend/onnxruntime/wrapper.py", line 5, in import onnxruntime as ort File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/onnxruntime/init.py", line 34, in raise import_capi_exception File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/onnxruntime/init.py", line 23, in from onnxruntime.capi._pybind_state import get_all_providers, get_available_providers, get_device, set_seed, \ File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/onnxruntime/capi/_pybind_state.py", line 11, in from . import _ld_preload # noqa: F401 File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/onnxruntime/capi/_ld_preload.py", line 15, in _libcurand = CDLL("libcurand.so.10", mode=RTLD_GLOBAL) File "/home/jianfeng/miniconda3/envs/mmdeploy/lib/python3.8/ctypes/init.py", line 373, in init self._handle = _dlopen(self._name, mode) OSError: libcurand.so.10: cannot open shared object file: No such file or directory 09/14 11:18:44 - mmengine - ERROR - tools/deploy.py - create_process - 82 - visualize onnxruntime model failed.

irexyc commented 1 year ago

OSError: libcurand.so.10: cannot open shared object file: No such file or directory

你是不是又装了onnxruntime-gpu?

你看下log的报错,是onnxurntime引起的。我猜你装的应该是1.8.1版本的?我印象里面旧的onnxruntime-gpu版本import的时候对cuda/cudnn有依赖,你可以装最新的onnxruntime-gpu试试。

建议你换cpu的版本吧,不然你得装一下cudatoolkit/cudnn并配一下LD_LIBRARY_PATH (nv下载,不要conda装)。

Jianfeng777 commented 1 year ago

确实,不用GPU方便很多,重新装了之后就发现CPU可以了

github-actions[bot] commented 1 year ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions[bot] commented 1 year ago

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

happybear1015 commented 1 month ago

总结一下哈,我遇到的问题和你一样,能导出,但是模型不能用。导出的时候报错:08/19 15:07:52 - mmengine - ERROR - mmdeploy/tools/deploy.py - create_process - 82 - visualize onnxruntime model failed.

解决方法: pip uninstall onnxruntime-gpu pip install onnxruntime

python mmdeploy/tools/deploy.py mmdeploy/configs/mmdet/instance-seg/instance-seg_onnxruntime_dynamic.py mmdetection/configs/mask_rcnn/mask-rcnn_r101_fpn_ms-poly-3x_coco.py epoch_11.pth 1.jpg --work-dir mmdeploy_model/EPOCH11 --device cpu --dump-info