open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.78k stars 637 forks source link

[Bug] #2774

Open limesqueezy opened 5 months ago

limesqueezy commented 5 months ago

Checklist

Describe the bug

Greetings,

Hope yall doing well :)

Simply following the getting started guide doesn't even give a working fastercnn, let alone trying a more complicated model.

The errors are vague and honestly I don't understand what another human being would figure out from "Plugin not found, are the plugin name, version, and namespace correct?".

Versions are not well documented either.

as an example guides force you to install pip install onnxruntime-gpu==1.8.1 that doesn't even exist on pip anymore, and mmdeploy is so outdated that it requires 1.8.1

Minimal versions of the error

Assuming that both GPU and CPU tensorRT conversion produce an onnx file,

>>> session = ort.InferenceSession('mmdeploy_model/faster-rcnn/end2end.onnx', providers='CUDAExecutionProvider')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lemon/miniconda3/envs/openmmlab/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/lemon/miniconda3/envs/openmmlab/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 472, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from mmdeploy_model/faster-rcnn/end2end.onnx failed:Fatal error: mmdeploy:GatherTopk(-1) is not a registered function/op
>>> session = ort.InferenceSession('mmdeploy_model/faster-rcnn/end2end.onnx', providers='CPUExecutionProvider')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lemon/miniconda3/envs/openmmlab/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/lemon/miniconda3/envs/openmmlab/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 472, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from mmdeploy_model/faster-rcnn/end2end.onnx failed:Fatal error: mmdeploy:GatherTopk(-1) is not a registered function/op

It completely fails to create a TRT engine or to run inference with the ONNX model.

Using GPU

python mmdeploy/tools/deploy.py \
    mmdeploy/configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py \
    mmdetection/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py \
    checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
    mmdetection/demo/demo.jpg \
    --work-dir mmdeploy_model/faster-rcnn \
    --device cuda \
    --dump-info

Making sure you have only onnxruntime-gpu installed I list package versions that run on my machine in the end of this section,

RuntimeError: Failed to parse onnx, In node 426 (importFallbackPluginImporter): UNSUPPORTED_NODE: Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

05/31 18:24:05 - mmengine - ERROR - /home/lemon/anomaly/mmdeploy/mmdeploy/apis/core/pipeline_manager.py - pop_mp_output - 80 - `mmdeploy.apis.utils.utils.to_backend` with Call id: 1 failed. exit.

Full output,

05/31 18:23:49 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
05/31 18:23:49 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
05/31 18:23:50 - mmengine - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess
05/31 18:23:51 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
05/31 18:23:51 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
Loads checkpoint by local backend from path: checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
05/31 18:23:52 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 
05/31 18:23:52 - mmengine - INFO - Export PyTorch model to ONNX: mmdeploy_model/faster-rcnn/end2end.onnx.
05/31 18:23:52 - mmengine - WARNING - Can not find torch.nn.functional._scaled_dot_product_attention, function rewrite will not be applied
05/31 18:23:52 - mmengine - WARNING - Can not find mmdet.models.utils.transformer.PatchMerging.forward, function rewrite will not be applied
/home/lemon/anomaly/mmdeploy/mmdeploy/core/optimizers/function_marker.py:160: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  ys_shape = tuple(int(s) for s in ys.shape)
/home/lemon/anomaly/mmdetection/mmdet/models/dense_heads/anchor_head.py:115: UserWarning: DeprecationWarning: anchor_generator is deprecated, please use "prior_generator" instead
  warnings.warn('DeprecationWarning: anchor_generator is deprecated, '
/home/lemon/anomaly/mmdetection/mmdet/models/task_modules/prior_generators/anchor_generator.py:356: UserWarning: ``grid_anchors`` would be deprecated soon. Please use ``grid_priors`` 
  warnings.warn('``grid_anchors`` would be deprecated soon. '
/home/lemon/anomaly/mmdetection/mmdet/models/task_modules/prior_generators/anchor_generator.py:392: UserWarning: ``single_level_grid_anchors`` would be deprecated soon. Please use ``single_level_grid_priors`` 
  warnings.warn(
/home/lemon/anomaly/mmdeploy/mmdeploy/codebase/mmdet/models/dense_heads/rpn_head.py:89: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert cls_score.size()[-2:] == bbox_pred.size()[-2:]
/home/lemon/anomaly/mmdeploy/mmdeploy/pytorch/functions/topk.py:58: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if k > size:
/home/lemon/anomaly/mmdeploy/mmdeploy/codebase/mmdet/models/task_modules/coders/delta_xywh_bbox_coder.py:38: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert pred_bboxes.size(0) == bboxes.size(0)
/home/lemon/anomaly/mmdeploy/mmdeploy/codebase/mmdet/models/task_modules/coders/delta_xywh_bbox_coder.py:40: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert pred_bboxes.size(1) == bboxes.size(1)
/home/lemon/anomaly/mmdeploy/mmdeploy/codebase/mmdet/deploy/utils.py:95: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  assert len(max_shape) == 2, '`max_shape` should be [h, w]'
/home/lemon/anomaly/mmdeploy/mmdeploy/mmcv/ops/nms.py:477: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  int(scores.shape[-1]),
/home/lemon/anomaly/mmdeploy/mmdeploy/mmcv/ops/nms.py:149: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  out_boxes = min(num_boxes, after_topk)
/home/lemon/anomaly/mmdeploy/mmdeploy/codebase/mmdet/models/roi_heads/standard_roi_head.py:41: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  rois_dims = int(rois.shape[-1])
[W shape_type_inference.cpp:1974] Warning: The shape inference of mmdeploy::GatherTopk type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1974] Warning: The shape inference of mmdeploy::GatherTopk type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
/home/lemon/miniconda3/envs/openmmlab/lib/python3.8/site-packages/torch/onnx/symbolic_opset9.py:5856: UserWarning: Exporting aten::index operator of advanced indexing in opset 11 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
  warnings.warn(
[W shape_type_inference.cpp:1974] Warning: The shape inference of mmdeploy::GatherTopk type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1974] Warning: The shape inference of mmdeploy::GatherTopk type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1974] Warning: The shape inference of mmdeploy::GatherTopk type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1974] Warning: The shape inference of mmdeploy::GatherTopk type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1974] Warning: The shape inference of mmdeploy::GatherTopk type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1974] Warning: The shape inference of mmdeploy::GatherTopk type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1974] Warning: The shape inference of mmdeploy::GatherTopk type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1974] Warning: The shape inference of mmdeploy::GatherTopk type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1974] Warning: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1974] Warning: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1974] Warning: The shape inference of mmdeploy::MMCVMultiLevelRoiAlign type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1974] Warning: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1974] Warning: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
05/31 18:23:56 - mmengine - INFO - Execute onnx optimize passes.
05/31 18:23:57 - mmengine - INFO - Finish pipeline mmdeploy.apis.pytorch2onnx.torch2onnx
05/31 18:23:59 - mmengine - INFO - Start pipeline mmdeploy.apis.utils.utils.to_backend in subprocess
05/31 18:23:59 - mmengine - WARNING - Could not load the library of tensorrt plugins.             Because the file does not exist: 
[05/31/2024-18:23:59] [TRT] [I] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 121, GPU 269 (MiB)
[05/31/2024-18:24:04] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1453, GPU +268, now: CPU 1651, GPU 537 (MiB)
[05/31/2024-18:24:04] [TRT] [I] ----------------------------------------------------------------
[05/31/2024-18:24:04] [TRT] [I] Input filename:   mmdeploy_model/faster-rcnn/end2end.onnx
[05/31/2024-18:24:04] [TRT] [I] ONNX IR version:  0.0.6
[05/31/2024-18:24:04] [TRT] [I] Opset version:    11
[05/31/2024-18:24:04] [TRT] [I] Producer name:    pytorch
[05/31/2024-18:24:04] [TRT] [I] Producer version: 2.1.0
[05/31/2024-18:24:04] [TRT] [I] Domain:           
[05/31/2024-18:24:04] [TRT] [I] Model version:    0
[05/31/2024-18:24:04] [TRT] [I] Doc string:       
[05/31/2024-18:24:04] [TRT] [I] ----------------------------------------------------------------
[05/31/2024-18:24:04] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/31/2024-18:24:04] [TRT] [W] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[05/31/2024-18:24:04] [TRT] [I] No importer registered for op: GatherTopk. Attempting to import as plugin.
[05/31/2024-18:24:04] [TRT] [I] Searching for plugin: GatherTopk, plugin_version: 1, plugin_namespace: 
[05/31/2024-18:24:04] [TRT] [E] 3: getPluginCreator could not find plugin: GatherTopk version: 1
[05/31/2024-18:24:04] [TRT] [E] ModelImporter.cpp:771: While parsing node number 426 [GatherTopk -> "/GatherTopk_output_0"]:
[05/31/2024-18:24:04] [TRT] [E] ModelImporter.cpp:772: --- Begin node ---
[05/31/2024-18:24:04] [TRT] [E] ModelImporter.cpp:773: input: "/Concat_21_output_0"
input: "/TopK_output_1"
output: "/GatherTopk_output_0"
name: "/GatherTopk"
op_type: "GatherTopk"
domain: "mmdeploy"

[05/31/2024-18:24:04] [TRT] [E] ModelImporter.cpp:774: --- End node ---
[05/31/2024-18:24:04] [TRT] [E] ModelImporter.cpp:777: ERROR: builtin_op_importers.cpp:5404 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
Process Process-3:
Traceback (most recent call last):
  File "/home/lemon/miniconda3/envs/openmmlab/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/lemon/miniconda3/envs/openmmlab/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lemon/anomaly/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/home/lemon/anomaly/mmdeploy/mmdeploy/apis/utils/utils.py", line 98, in to_backend
    return backend_mgr.to_backend(
  File "/home/lemon/anomaly/mmdeploy/mmdeploy/backend/tensorrt/backend_manager.py", line 127, in to_backend
    onnx2tensorrt(
  File "/home/lemon/anomaly/mmdeploy/mmdeploy/backend/tensorrt/onnx2tensorrt.py", line 79, in onnx2tensorrt
    from_onnx(
  File "/home/lemon/anomaly/mmdeploy/mmdeploy/backend/tensorrt/utils.py", line 185, in from_onnx
    raise RuntimeError(f'Failed to parse onnx, {error_msgs}')
RuntimeError: Failed to parse onnx, In node 426 (importFallbackPluginImporter): UNSUPPORTED_NODE: Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

05/31 18:24:05 - mmengine - ERROR - /home/lemon/anomaly/mmdeploy/mmdeploy/apis/core/pipeline_manager.py - pop_mp_output - 80 - `mmdeploy.apis.utils.utils.to_backend` with Call id: 1 failed. exit.

mmcv 2.1.0 mmdeploy 1.3.1 /home/lemon/anomaly/mmdeploy mmdeploy-runtime 1.3.1 mmdeploy-runtime-gpu 1.3.1 mmdet 3.3.0 /home/lemon/anomaly/mmdetection mmengine 0.10.4 onnx 1.16.0 onnxruntime-gpu 1.18.0 tensorrt 8.6.1

Using CPU

python mmdeploy/tools/deploy.py \
    mmdeploy/configs/mmdet/detection/detection_onnxruntime_dynamic.py \
    mmdetection/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py \
    checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
    mmdetection/demo/demo.jpg \
    --work-dir mmdeploy_model_output/faster-rcnn \
    --device cpu \
    --dump-info

Results in an error that has been documented again and again alluding to a mistake in the pth file but I doubt there's an error in the pth of the getting started guide.

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from mmdeploy_model_output/faster-rcnn/end2end.onnx failed:Type Error: Type parameter (T) of Optype (Where) bound to different types (tensor(int64) and tensor(float) in node (/Where_11).
05/31 17:53:35 - mmengine - ERROR - mmdeploy/tools/deploy.py - create_process - 82 - visualize onnxruntime model failed.

Full error,

05/31 17:53:24 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
05/31 17:53:24 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
05/31 17:53:25 - mmengine - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess
05/31 17:53:26 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
05/31 17:53:26 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
Loads checkpoint by local backend from path: checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
05/31 17:53:26 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 
05/31 17:53:26 - mmengine - INFO - Export PyTorch model to ONNX: mmdeploy_model_output/faster-rcnn/end2end.onnx.
/home/lemon/anomaly/mmdeploy/mmdeploy/core/optimizers/function_marker.py:160: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  ys_shape = tuple(int(s) for s in ys.shape)
/home/lemon/anomaly/mmdetection/mmdet/models/dense_heads/anchor_head.py:115: UserWarning: DeprecationWarning: anchor_generator is deprecated, please use "prior_generator" instead
  warnings.warn('DeprecationWarning: anchor_generator is deprecated, '
/home/lemon/anomaly/mmdetection/mmdet/models/task_modules/prior_generators/anchor_generator.py:356: UserWarning: ``grid_anchors`` would be deprecated soon. Please use ``grid_priors`` 
  warnings.warn('``grid_anchors`` would be deprecated soon. '
/home/lemon/anomaly/mmdetection/mmdet/models/task_modules/prior_generators/anchor_generator.py:392: UserWarning: ``single_level_grid_anchors`` would be deprecated soon. Please use ``single_level_grid_priors`` 
  warnings.warn(
/home/lemon/anomaly/mmdeploy/mmdeploy/codebase/mmdet/models/dense_heads/rpn_head.py:89: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert cls_score.size()[-2:] == bbox_pred.size()[-2:]
/home/lemon/anomaly/mmdeploy/mmdeploy/pytorch/functions/topk.py:28: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  k = torch.tensor(k, device=input.device, dtype=torch.long)
/home/lemon/anomaly/mmdeploy/mmdeploy/codebase/mmdet/models/task_modules/coders/delta_xywh_bbox_coder.py:38: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert pred_bboxes.size(0) == bboxes.size(0)
/home/lemon/anomaly/mmdeploy/mmdeploy/codebase/mmdet/models/task_modules/coders/delta_xywh_bbox_coder.py:40: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert pred_bboxes.size(1) == bboxes.size(1)
/home/lemon/anomaly/mmdeploy/mmdeploy/codebase/mmdet/deploy/utils.py:48: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  assert len(max_shape) == 2, '`max_shape` should be [h, w]'
/home/lemon/anomaly/mmdeploy/mmdeploy/mmcv/ops/nms.py:285: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  iou_threshold = torch.tensor([iou_threshold], dtype=torch.float32)
/home/lemon/anomaly/mmdeploy/mmdeploy/mmcv/ops/nms.py:286: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  score_threshold = torch.tensor([score_threshold], dtype=torch.float32)
/home/lemon/anomaly/mmdeploy/mmdeploy/mmcv/ops/nms.py:45: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  score_threshold = float(score_threshold)
/home/lemon/anomaly/mmdeploy/mmdeploy/mmcv/ops/nms.py:46: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  iou_threshold = float(iou_threshold)
/home/lemon/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/nms.py:123: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert boxes.size(1) == 4
/home/lemon/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/nms.py:124: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert boxes.size(0) == scores.size(0)
/home/lemon/anomaly/mmdeploy/mmdeploy/codebase/mmdet/models/roi_heads/standard_roi_head.py:41: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  rois_dims = int(rois.shape[-1])
/home/lemon/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/roi_align.py:78: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert rois.size(1) == 5, 'RoI must be (idx, x1, y1, x2, y2)!'
/home/lemon/miniconda3/envs/openmmlab/lib/python3.8/site-packages/torch/onnx/symbolic_opset9.py:5856: UserWarning: Exporting aten::index operator of advanced indexing in opset 11 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
  warnings.warn(
/home/lemon/anomaly/mmdeploy/mmdeploy/mmcv/ops/roi_align.py:64: FutureWarning: 'torch.onnx.symbolic_opset9._cast_Long' is deprecated in version 2.0 and will be removed in the future. Please Avoid using this function and create a Cast node instead.
  batch_indices = _cast_Long(
05/31 17:53:32 - mmengine - INFO - Execute onnx optimize passes.
/home/lemon/miniconda3/envs/openmmlab/lib/python3.8/site-packages/torch/onnx/utils.py:1686: UserWarning: The exported ONNX model failed ONNX shape inference. The model will not be executable by the ONNX Runtime. If this is unintended and you believe there is a bug, please report an issue at https://github.com/pytorch/pytorch/issues. Error reported by strict ONNX shape inference: [ShapeInferenceError] (op_type:Where, node name: /Where_11): Y has inconsistent type tensor(float) (Triggered internally at /opt/conda/conda-bld/pytorch_1695392036766/work/torch/csrc/jit/serialization/export.cpp:1415.)
  _C._check_onnx_proto(proto)
05/31 17:53:32 - mmengine - INFO - Finish pipeline mmdeploy.apis.pytorch2onnx.torch2onnx
05/31 17:53:33 - mmengine - INFO - Start pipeline mmdeploy.apis.utils.utils.to_backend in main process
05/31 17:53:33 - mmengine - INFO - Finish pipeline mmdeploy.apis.utils.utils.to_backend
05/31 17:53:33 - mmengine - INFO - visualize onnxruntime model start.
05/31 17:53:35 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
05/31 17:53:35 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
05/31 17:53:35 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "backend_detectors" registry tree. As a workaround, the current "backend_detectors" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
05/31 17:53:35 - mmengine - INFO - Successfully loaded onnxruntime custom ops from /home/lemon/anomaly/mmdeploy/mmdeploy/lib/libmmdeploy_onnxruntime_ops.so
2024-05-31:17:53:35 - root - ERROR - [ONNXRuntimeError] : 1 : FAIL : Load model from mmdeploy_model_output/faster-rcnn/end2end.onnx failed:Type Error: Type parameter (T) of Optype (Where) bound to different types (tensor(int64) and tensor(float) in node (/Where_11).
Traceback (most recent call last):
  File "/home/lemon/anomaly/mmdeploy/mmdeploy/utils/utils.py", line 41, in target_wrapper
    result = target(*args, **kwargs)
  File "/home/lemon/anomaly/mmdeploy/mmdeploy/apis/visualize.py", line 65, in visualize_model
    model = task_processor.build_backend_model(
  File "/home/lemon/anomaly/mmdeploy/mmdeploy/codebase/mmdet/deploy/object_detection.py", line 159, in build_backend_model
    model = build_object_detection_model(
  File "/home/lemon/anomaly/mmdeploy/mmdeploy/codebase/mmdet/deploy/object_detection_model.py", line 1111, in build_object_detection_model
    backend_detector = __BACKEND_MODEL.build(
  File "/home/lemon/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/home/lemon/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "/home/lemon/anomaly/mmdeploy/mmdeploy/codebase/mmdet/deploy/object_detection_model.py", line 56, in __init__
    self._init_wrapper(
  File "/home/lemon/anomaly/mmdeploy/mmdeploy/codebase/mmdet/deploy/object_detection_model.py", line 70, in _init_wrapper
    self.wrapper = BaseBackendModel._build_wrapper(
  File "/home/lemon/anomaly/mmdeploy/mmdeploy/codebase/base/backend_model.py", line 65, in _build_wrapper
    return backend_mgr.build_wrapper(backend_files, device, input_names,
  File "/home/lemon/anomaly/mmdeploy/mmdeploy/backend/onnxruntime/backend_manager.py", line 35, in build_wrapper
    return ORTWrapper(
  File "/home/lemon/anomaly/mmdeploy/mmdeploy/backend/onnxruntime/wrapper.py", line 63, in __init__
    sess = ort.InferenceSession(
  File "/home/lemon/miniconda3/envs/openmmlab/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 283, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/lemon/miniconda3/envs/openmmlab/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 310, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from mmdeploy_model_output/faster-rcnn/end2end.onnx failed:Type Error: Type parameter (T) of Optype (Where) bound to different types (tensor(int64) and tensor(float) in node (/Where_11).
05/31 17:53:35 - mmengine - ERROR - mmdeploy/tools/deploy.py - create_process - 82 - visualize onnxruntime model failed.

package versions,

mmcv 2.1.0 mmdeploy 1.3.1 /home/lemon/anomaly/mmdeploy mmdeploy-runtime 1.3.1 mmdet 3.3.0 /home/lemon/anomaly/mmdetection mmengine 0.10.4 onnx 1.16.0 onnxruntime 1.8.1 opencv-python 4.9.0.80 tensorrt 8.6.1

Any help or pointers would be much appreciated!

Ari

Reproduction

python mmdeploy/tools/deploy.py \
    mmdeploy/configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py \
    mmdetection/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py \
    checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
    mmdetection/demo/demo.jpg \
    --work-dir mmdeploy_model/faster-rcnn \
    --device cuda \
    --dump-info

Environment

1.

05/31 18:50:26 - mmengine - INFO - 

05/31 18:50:26 - mmengine - INFO - **********Environmental information**********
05/31 18:50:27 - mmengine - INFO - sys.platform: linux
05/31 18:50:27 - mmengine - INFO - Python: 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0]
05/31 18:50:27 - mmengine - INFO - CUDA available: True
05/31 18:50:27 - mmengine - INFO - MUSA available: False
05/31 18:50:27 - mmengine - INFO - numpy_random_seed: 2147483648
05/31 18:50:27 - mmengine - INFO - GPU 0: NVIDIA GeForce RTX 3090
05/31 18:50:27 - mmengine - INFO - CUDA_HOME: /usr/local/cuda
05/31 18:50:27 - mmengine - INFO - NVCC: Cuda compilation tools, release 12.2, V12.2.140
05/31 18:50:27 - mmengine - INFO - GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
05/31 18:50:27 - mmengine - INFO - PyTorch: 2.1.0
05/31 18:50:27 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

05/31 18:50:27 - mmengine - INFO - TorchVision: 0.16.0
05/31 18:50:27 - mmengine - INFO - OpenCV: 4.9.0
05/31 18:50:27 - mmengine - INFO - MMEngine: 0.10.4
05/31 18:50:27 - mmengine - INFO - MMCV: 2.1.0
05/31 18:50:27 - mmengine - INFO - MMCV Compiler: GCC 9.3
05/31 18:50:27 - mmengine - INFO - MMCV CUDA Compiler: 12.1
05/31 18:50:27 - mmengine - INFO - MMDeploy: 1.3.1+bc75c9d
05/31 18:50:27 - mmengine - INFO - 

05/31 18:50:27 - mmengine - INFO - **********Backend information**********
05/31 18:50:27 - mmengine - INFO - tensorrt:    8.6.1
05/31 18:50:27 - mmengine - INFO - tensorrt custom ops: NotAvailable
05/31 18:50:27 - mmengine - INFO - ONNXRuntime: None
05/31 18:50:27 - mmengine - INFO - ONNXRuntime-gpu:     1.18.0
05/31 18:50:27 - mmengine - INFO - ONNXRuntime custom ops:      Available
05/31 18:50:27 - mmengine - INFO - pplnn:       None
05/31 18:50:27 - mmengine - INFO - ncnn:        None
05/31 18:50:27 - mmengine - INFO - snpe:        None
05/31 18:50:27 - mmengine - INFO - openvino:    None
05/31 18:50:27 - mmengine - INFO - torchscript: 2.1.0
05/31 18:50:27 - mmengine - INFO - torchscript custom ops:      NotAvailable
05/31 18:50:27 - mmengine - INFO - rknn-toolkit:        None
05/31 18:50:27 - mmengine - INFO - rknn-toolkit2:       None
05/31 18:50:27 - mmengine - INFO - ascend:      None
05/31 18:50:27 - mmengine - INFO - coreml:      None
05/31 18:50:27 - mmengine - INFO - tvm: None
05/31 18:50:27 - mmengine - INFO - vacc:        None
05/31 18:50:27 - mmengine - INFO - 

05/31 18:50:27 - mmengine - INFO - **********Codebase information**********
05/31 18:50:27 - mmengine - INFO - mmdet:       3.3.0
05/31 18:50:27 - mmengine - INFO - mmseg:       None
05/31 18:50:27 - mmengine - INFO - mmpretrain:  None
05/31 18:50:27 - mmengine - INFO - mmocr:       None
05/31 18:50:27 - mmengine - INFO - mmagic:      None
05/31 18:50:27 - mmengine - INFO - mmdet3d:     None
05/31 18:50:27 - mmengine - INFO - mmpose:      None
05/31 18:50:27 - mmengine - INFO - mmrotate:    None
05/31 18:50:27 - mmengine - INFO - mmaction:    None
05/31 18:50:27 - mmengine - INFO - mmrazor:     None
05/31 18:50:27 - mmengine - INFO - mmyolo:      None

2.

$LD_LIBRARY_PATH
/home/lemon/anomaly/mmetc/onnxruntime-linux-x64-cuda-1.17.0/lib:
/home/lemon/anomaly/mmetc/ppl.cv/lib:
/home/lemon/anomaly/mmetc/ppl.cv/build/install/lib/:
/home/lemon/anomaly/mmetc/ppl.cv/cuda/lib64:
/home/lemon/anomaly/mmetc/onnxruntime-linux-x64-cuda-1.17.0/lib:
/home/lemon/miniconda3/envs/openmmlab/lib:
/home/lemon/anomaly/mmetc/TensorRT-8.6.1.6/lib:
/usr/local/cuda/lib64:
/home/lemon/anomaly/mmetc/onnxruntime-linux-x64-cuda-1.17.0/lib:
/home/lemon/miniconda3/envs/openmmlab/lib:
/home/lemon/anomaly/mmetc/TensorRT-8.6.1.6/lib:
/usr/local/cuda/lib64::/build

$PATH
/home/lemon/miniconda3/envs/openmmlab/bin:
/home/lemon/miniconda3/condabin:
/usr/bin:
/usr/local/cuda/bin:
/home/cherry/.conda:
/home/lemon/.vscode-server/bin/dc96b837cf6bb4af9cd736aa3af08cf8279f7685/bin/remote-cli:
/usr/local/sbin:
/usr/local/bin:
/usr/sbin:
/usr/bin:
/sbin
:/bin
:/usr/games
:/usr/local/games:
/snap/bin:
/opt/ngc-cli:
/opt/ngc-cli

echo $PYTHONPATH
NA

Error traceback

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from mmdeploy_model/faster-rcnn/end2end.onnx failed:Fatal error: mmdeploy:GatherTopk(-1) is not a registered function/op
limesqueezy commented 5 months ago

To solve the above we had to build from source where the major difficulty building and linking pplcv, which in turn had to also be built from source with a major bug here.

FellowPlanter commented 3 weeks ago

Do you still happen to have the build script? I am running into the exact same issues