open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.77k stars 636 forks source link

[Bug] Wrong Onnx Execution Provider - Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0] #2742

Open ehratjon opened 6 months ago

ehratjon commented 6 months ago

Checklist

Describe the bug

When running the model conversion tool using an AMD GPU, the command fails with:

RuntimeError: Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]
mmengine - ERROR - /home/user/Projects/mm_deploy/mmdeploy/tools/deploy.py - create_process - 82 - visualize onnxruntime model failed.

I made sure that the GPU is accessible:

>>> import sys
>>> import torch
>>> import onnxruntime as ort
>>> sys.version
'3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:24:10) [GCC 9.4.0]'
>>> torch.__version__
'2.4.0.dev20240417+rocm6.0'
>>> torch.cuda.is_available()
True
>>> ort.__version__
'1.17.0'
>>> ort.get_available_providers()
['MIGraphXExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider']

While searching for a solution i found that a user warning is generated:

/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:70: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.

And it seems that the onnx runtime defaults to a 'CUDAExecutionProvider' instead of one of the available ones. But as you can see above in the python commands, the correct EPs can be found by the onnx runtime.

Hacky Solution: Going into the the library package and the onnxruntime_inference_collection.py file and forcing the 'porviders' to be set to the ROCMExecutionProvider in the _create_inference_session function solves the issue. However it seems very bad to hard code something into a library and defeats the purpose of a modular config system.

So my question: Is there a way to specify the Exection Provider through the mmdeploy config system? Or did I miss any other obvious settings to enable the conversion to onnx using an AMD GPU?

Reproduction

I was following the model conversion and changed the conversion to config to onnx and the model in use to yolo.

python mmdeploy/tools/deploy.py \
    mmdeploy/configs/mmdet/detection/detection_onnxruntime_dynamic.py  \
    mmdetection/configs/yolo/yolov3_d53_8xb8-ms-608-273e_coco.py \
    checkpoints/yolov3_d53_mstrain-608_273e_coco_20210518_115020-a2c3acb8.pth \
    mmdetection/demo/demo.jpg \
    --work-dir mmdeploy_model/yolo \
    --device cuda \
    --dump-info

Environment

04/19 16:03:48 - mmengine - INFO - 

04/19 16:03:48 - mmengine - INFO - **********Environmental information**********
/bin/sh: 1: /opt/rocm-6.0.2/bin/nvcc: not found
/bin/sh: 1: /opt/rocm-6.0.2/bin/nvcc: not found
04/19 16:03:49 - mmengine - INFO - sys.platform: linux
04/19 16:03:49 - mmengine - INFO - Python: 3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:24:10) [GCC 9.4.0]
04/19 16:03:49 - mmengine - INFO - CUDA available: True
04/19 16:03:49 - mmengine - INFO - MUSA available: False
04/19 16:03:49 - mmengine - INFO - numpy_random_seed: 2147483648
04/19 16:03:49 - mmengine - INFO - GPU 0: AMD Radeon RX 6800 XT
04/19 16:03:49 - mmengine - INFO - CUDA_HOME: /opt/rocm-6.0.2
04/19 16:03:49 - mmengine - INFO - NVCC: Not Available
04/19 16:03:49 - mmengine - INFO - GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
04/19 16:03:49 - mmengine - INFO - PyTorch: 2.4.0.dev20240417+rocm6.0
04/19 16:03:49 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - HIP Runtime 6.0.32830
  - MIOpen 3.0.0
  - Magma 2.7.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DTMP_LIBKINETO_NANOSECOND -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.4.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=ON, USE_ROCM_KERNEL_ASSERT=OFF, 

04/19 16:03:49 - mmengine - INFO - TorchVision: 0.19.0.dev20240417+rocm6.0
04/19 16:03:49 - mmengine - INFO - OpenCV: 4.9.0
04/19 16:03:49 - mmengine - INFO - MMEngine: 0.10.3
04/19 16:03:49 - mmengine - INFO - MMCV: 2.1.0
04/19 16:03:49 - mmengine - INFO - MMCV Compiler: GCC 11.4
04/19 16:03:49 - mmengine - INFO - MMCV CUDA Compiler: 60032830
04/19 16:03:49 - mmengine - INFO - MMDeploy: 1.3.1+
04/19 16:03:49 - mmengine - INFO - 

04/19 16:03:49 - mmengine - INFO - **********Backend information**********
04/19 16:03:49 - mmengine - INFO - tensorrt:    None
04/19 16:03:49 - mmengine - INFO - ONNXRuntime: None
04/19 16:03:49 - mmengine - INFO - ONNXRuntime-gpu:     None
04/19 16:03:49 - mmengine - INFO - ONNXRuntime custom ops:      Available
04/19 16:03:49 - mmengine - INFO - pplnn:       None
04/19 16:03:49 - mmengine - INFO - ncnn:        None
04/19 16:03:49 - mmengine - INFO - snpe:        None
04/19 16:03:49 - mmengine - INFO - openvino:    None
04/19 16:03:49 - mmengine - INFO - torchscript: 2.4.0.dev20240417+rocm6.0
04/19 16:03:49 - mmengine - INFO - torchscript custom ops:      NotAvailable
04/19 16:03:49 - mmengine - INFO - rknn-toolkit:        None
04/19 16:03:49 - mmengine - INFO - rknn-toolkit2:       None
04/19 16:03:49 - mmengine - INFO - ascend:      None
04/19 16:03:49 - mmengine - INFO - coreml:      None
04/19 16:03:49 - mmengine - INFO - tvm: None
04/19 16:03:49 - mmengine - INFO - vacc:        None
04/19 16:03:49 - mmengine - INFO - 

04/19 16:03:49 - mmengine - INFO - **********Codebase information**********
04/19 16:03:49 - mmengine - INFO - mmdet:       3.3.0
04/19 16:03:49 - mmengine - INFO - mmseg:       None
04/19 16:03:49 - mmengine - INFO - mmpretrain:  None
04/19 16:03:49 - mmengine - INFO - mmocr:       None
04/19 16:03:49 - mmengine - INFO - mmagic:      None
04/19 16:03:49 - mmengine - INFO - mmdet3d:     None
04/19 16:03:49 - mmengine - INFO - mmpose:      None
04/19 16:03:49 - mmengine - INFO - mmrotate:    None
04/19 16:03:49 - mmengine - INFO - mmaction:    None
04/19 16:03:49 - mmengine - INFO - mmrazor:     None
04/19 16:03:49 - mmengine - INFO - mmyolo:      None

Error traceback

04/19 16:06:22 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
04/19 16:06:22 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
04/19 16:06:24 - mmengine - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess
04/19 16:06:25 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
04/19 16:06:25 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
Loads checkpoint by local backend from path: checkpoints/yolov3_d53_mstrain-608_273e_coco_20210518_115020-a2c3acb8.pth
04/19 16:06:26 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 
04/19 16:06:26 - mmengine - INFO - Export PyTorch model to ONNX: mmdeploy_model/yolo/end2end.onnx.
/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/mmdeploy/core/optimizers/function_marker.py:160: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  ys_shape = tuple(int(s) for s in ys.shape)
/home/user/Projects/mm_deploy/mmdetection/mmdet/models/task_modules/prior_generators/anchor_generator.py:356: UserWarning: ``grid_anchors`` would be deprecated soon. Please use ``grid_priors`` 
  warnings.warn('``grid_anchors`` would be deprecated soon. '
/home/user/Projects/mm_deploy/mmdetection/mmdet/models/task_modules/prior_generators/anchor_generator.py:392: UserWarning: ``single_level_grid_anchors`` would be deprecated soon. Please use ``single_level_grid_priors`` 
  warnings.warn(
/home/user/Projects/mm_deploy/mmdetection/mmdet/models/task_modules/coders/yolo_bbox_coder.py:81: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert pred_bboxes.size(-1) == bboxes.size(-1) == 4
/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/mmdeploy/pytorch/functions/topk.py:28: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  k = torch.tensor(k, device=input.device, dtype=torch.long)
/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/mmdeploy/mmcv/ops/nms.py:285: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  iou_threshold = torch.tensor([iou_threshold], dtype=torch.float32)
/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/mmdeploy/mmcv/ops/nms.py:286: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  score_threshold = torch.tensor([score_threshold], dtype=torch.float32)
/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/mmdeploy/mmcv/ops/nms.py:45: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  score_threshold = float(score_threshold)
/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/mmdeploy/mmcv/ops/nms.py:46: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  iou_threshold = float(iou_threshold)
/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/mmcv/ops/nms.py:123: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert boxes.size(1) == 4
/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/mmcv/ops/nms.py:124: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert boxes.size(0) == scores.size(0)
/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/torch/onnx/symbolic_opset9.py:5673: UserWarning: Exporting aten::index operator of advanced indexing in opset 11 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
  warnings.warn(
04/19 16:06:29 - mmengine - INFO - Execute onnx optimize passes.
04/19 16:06:29 - mmengine - WARNING - Can not optimize model, please build torchscipt extension.
More details: https://github.com/open-mmlab/mmdeploy/tree/main/docs/en/experimental/onnx_optimizer.md
04/19 16:06:30 - mmengine - INFO - Finish pipeline mmdeploy.apis.pytorch2onnx.torch2onnx
04/19 16:06:31 - mmengine - INFO - Start pipeline mmdeploy.apis.utils.utils.to_backend in main process
04/19 16:06:31 - mmengine - INFO - Finish pipeline mmdeploy.apis.utils.utils.to_backend
04/19 16:06:31 - mmengine - INFO - visualize onnxruntime model start.
04/19 16:06:34 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
04/19 16:06:34 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
04/19 16:06:34 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "backend_detectors" registry tree. As a workaround, the current "backend_detectors" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
04/19 16:06:34 - mmengine - INFO - Successfully loaded onnxruntime custom ops from /home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/mmdeploy/lib/libmmdeploy_onnxruntime_ops.so
/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:70: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'MIGraphXExecutionProvider, ROCMExecutionProvider, CPUExecutionProvider'
2024-04-19:16:06:35 - root - ERROR - Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/mmdeploy/utils/utils.py", line 41, in target_wrapper
    result = target(*args, **kwargs)
  File "/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/mmdeploy/apis/visualize.py", line 72, in visualize_model
    result = model.test_step(model_inputs)[0]
  File "/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/mmengine/model/base_model/base_model.py", line 145, in test_step
    return self._run_forward(data, mode='predict')  # type: ignore
  File "/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/mmengine/model/base_model/base_model.py", line 361, in _run_forward
    results = self(**data, mode=mode)
  File "/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/mmdeploy/codebase/mmdet/deploy/object_detection_model.py", line 296, in forward
    outputs = self.predict(inputs)
  File "/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/mmdeploy/codebase/mmdet/deploy/object_detection_model.py", line 313, in predict
    outputs = self.wrapper({self.input_name: imgs})
  File "/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/mmdeploy/backend/onnxruntime/wrapper.py", line 97, in forward
    self.io_binding.bind_input(
  File "/home/user/miniconda3/envs/venv_deploy/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 560, in bind_input
    self._iobinding.bind_input(
RuntimeError: Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]
04/19 16:06:35 - mmengine - ERROR - /home/user/Projects/mm_deploy/mmdeploy/tools/deploy.py - create_process - 82 - visualize onnxruntime model failed.
zhouyizhuo commented 6 months ago

same problem