open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.79k stars 637 forks source link

[Bug] c_sdk API Inferences error #1632

Open SongYii opened 1 year ago

SongYii commented 1 year ago

Checklist

Describe the bug

c_sdk API Inferences error

Reproduction

./pose_detection cpu ../../mmdeploy_model/hrnet/ ../../demo/resources/human-pose.jpg

Environment

2023-01-10 16:35:40,065 - mmdeploy - INFO -

2023-01-10 16:35:40,065 - mmdeploy - INFO - **********Environmental information**********
2023-01-10 16:35:41,427 - mmdeploy - INFO - sys.platform: linux
2023-01-10 16:35:41,428 - mmdeploy - INFO - Python: 3.9.15 (main, Nov 24 2022, 14:31:59) [GCC 11.2.0]
2023-01-10 16:35:41,428 - mmdeploy - INFO - CUDA available: True
2023-01-10 16:35:41,428 - mmdeploy - INFO - GPU 0: NVIDIA GeForce RTX 3090 Ti
2023-01-10 16:35:41,428 - mmdeploy - INFO - CUDA_HOME: /home/ps/anaconda3/envs/yisong_dis
2023-01-10 16:35:41,428 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 11.7, V11.7.99
2023-01-10 16:35:41,428 - mmdeploy - INFO - GCC: gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
2023-01-10 16:35:41,428 - mmdeploy - INFO - PyTorch: 1.13.1
2023-01-10 16:35:41,428 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.7
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.5
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

2023-01-10 16:35:41,428 - mmdeploy - INFO - TorchVision: 0.14.1
2023-01-10 16:35:41,428 - mmdeploy - INFO - OpenCV: 4.5.4
2023-01-10 16:35:41,428 - mmdeploy - INFO - MMCV: 1.7.1
2023-01-10 16:35:41,428 - mmdeploy - INFO - MMCV Compiler: GCC 9.3
2023-01-10 16:35:41,428 - mmdeploy - INFO - MMCV CUDA Compiler: 11.7
2023-01-10 16:35:41,428 - mmdeploy - INFO - MMDeploy: 0.12.0+e4ad0d4
2023-01-10 16:35:41,428 - mmdeploy - INFO -

2023-01-10 16:35:41,428 - mmdeploy - INFO - **********Backend information**********
2023-01-10 16:35:42,044 - mmdeploy - INFO - tensorrt:   None
2023-01-10 16:35:42,755 - mmdeploy - INFO - ONNXRuntime:    1.8.1
2023-01-10 16:35:42,755 - mmdeploy - INFO - ONNXRuntime-gpu:    1.13.1
2023-01-10 16:35:42,756 - mmdeploy - INFO - ONNXRuntime custom ops: Available
2023-01-10 16:35:42,892 - mmdeploy - INFO - pplnn:  None
2023-01-10 16:35:42,976 - mmdeploy - INFO - ncnn:   None
2023-01-10 16:35:42,980 - mmdeploy - INFO - snpe:   None
2023-01-10 16:35:43,495 - mmdeploy - INFO - openvino:   2022.3.0
2023-01-10 16:35:43,512 - mmdeploy - INFO - torchscript:    1.13.1
2023-01-10 16:35:43,513 - mmdeploy - INFO - torchscript custom ops: NotAvailable
2023-01-10 16:35:43,657 - mmdeploy - INFO - rknn-toolkit:   None
2023-01-10 16:35:43,657 - mmdeploy - INFO - rknn2-toolkit:  None
2023-01-10 16:35:43,683 - mmdeploy - INFO - ascend: None
2023-01-10 16:35:43,688 - mmdeploy - INFO - coreml: None
2023-01-10 16:35:43,850 - mmdeploy - INFO - tvm:    None
2023-01-10 16:35:43,850 - mmdeploy - INFO -

2023-01-10 16:35:43,850 - mmdeploy - INFO - **********Codebase information**********
2023-01-10 16:35:43,875 - mmdeploy - INFO - mmdet:  None
2023-01-10 16:35:43,875 - mmdeploy - INFO - mmseg:  0.29.1
2023-01-10 16:35:43,875 - mmdeploy - INFO - mmcls:  None
2023-01-10 16:35:43,875 - mmdeploy - INFO - mmocr:  None
2023-01-10 16:35:43,875 - mmdeploy - INFO - mmedit: 0.16.0
2023-01-10 16:35:43,875 - mmdeploy - INFO - mmdet3d:    None
2023-01-10 16:35:43,875 - mmdeploy - INFO - mmpose: None
2023-01-10 16:35:43,875 - mmdeploy - INFO - mmrotate:   None
2023-01-10 16:35:43,875 - mmdeploy - INFO - mmaction:   None

Error traceback

[2023-01-10 16:26:24.589] [mmdeploy] [info] [model.cpp:35] [DirectoryModel] Load model: "../../mmdeploy_model/hrnet/"
[2023-01-10 16:26:25.105] [mmdeploy] [info] [inference.cpp:44] {
  "context": {
    "device": "<any>",
    "model": "<any>",
    "stream": "<any>"
  },
  "pipeline": {
    "input": [
      "img"
    ],
    "output": [
      "post_output"
    ],
    "tasks": [
      {
        "input": [
          "img"
        ],
        "module": "Transform",
        "name": "Preprocess",
        "output": [
          "prep_output"
        ],
        "transforms": [
          {
            "type": "LoadImageFromFile"
          },
          {
            "image_size": [
              192,
              256
            ],
            "padding": 1.25,
            "type": "TopDownGetBboxCenterScale"
          },
          {
            "image_size": [
              192,
              256
            ],
            "type": "TopDownAffine"
          },
          {
            "mean": [
              123.675,
              116.28,
              103.53
            ],
            "std": [
              58.395,
              57.120000000000005,
              57.375
            ],
            "to_rgb": true,
            "type": "Normalize"
          },
          {
            "keys": [
              "img"
            ],
            "type": "ImageToTensor"
          },
          {
            "keys": [
              "img"
            ],
            "meta_keys": [
              "flip_direction",
              "rotation",
              "img_shape",
              "filename",
              "ori_filename",
              "ori_shape",
              "flip",
              "img_norm_cfg",
              "center",
              "scale_factor",
              "valid_ratio",
              "image_file",
              "pad_shape",
              "scale",
              "bbox_score",
              "flip_pairs"
            ],
            "type": "Collect"
          }
        ],
        "type": "Task"
      },
      {
        "input": [
          "prep_output"
        ],
        "input_map": {
          "img": "input"
        },
        "is_batched": false,
        "module": "Net",
        "name": "topdown",
        "output": [
          "infer_output"
        ],
        "output_map": {},
        "type": "Task"
      },
      {
        "component": "TopdownHeatmapSimpleHeadDecode",
        "input": [
          "prep_output",
          "infer_output"
        ],
        "module": "mmpose",
        "name": "postprocess",
        "output": [
          "post_output"
        ],
        "params": {
          "flip_test": true,
          "modulate_kernel": 11,
          "post_process": "default",
          "shift_heatmap": true
        },
        "type": "Task"
      }
    ]
  }
}
[2023-01-10 16:26:49.166] [mmdeploy] [info] [inference.cpp:56] ["img"] <- ["image"]
[2023-01-10 16:26:49.167] [mmdeploy] [info] [inference.cpp:67] ["post_output"] -> ["dets"]
/opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_vector.h:1042: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = int; _Alloc = std::allocator<int>; std::vector<_Tp, _Alloc>::reference = int&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__builtin_expect(__n < this->size(), true)' failed.
Aborted (core dumped)
irexyc commented 1 year ago

It seems your compiler by default enforces -D_GLIBCXX_ASSERTIONS which turns on assertions in some C++11 constructs. However I can't reproduce with -D_GLIBCXX_ASSERTIONS on my machine.

You may use gdb and step check to find the route cause of this error or you can try to build mmdeploy with -DCMAKE_CXX_FLAGS=-U_GLIBCXX_ASSERTIONS

NicolasPetermann134 commented 1 year ago

Could you solve the problem? I'm facing the same issue. Would appreciate any help a lot.

irexyc commented 1 year ago

@NicolasPetermann134

Hi, are you using onnxruntime backend? Can you inference the model with pure onnxruntime api?

According to https://github.com/open-mmlab/mmdeploy/issues/2191, I guess the error may due to loading onnxruntime custom operator library. As I cannot be reproduced on my machine, could you please verify it?

the input/output name and shape may be different according to your onnx model, you can check it by netron. the custom onnxruntime operator library can be bound in your python package installation path.

with pure onnxruntime api

import onnxruntime as ort
import numpy as np
sess = ort.InferenceSession('/path/to/onnx', None, ['CPUExecutionProvider'])
input = np.random.randn(1, 3, 224, 224).astype(np.float32) # b, c, h, w
output = sess.run(['output'], input_feed={'input': input})

onnxruntime api with custom operator library

import onnxruntime as ort
import numpy as np
session_options = ort.SessionOptions()
session_options.register_custom_ops_library('/path/to/libmmdeploy_ort_net.so') # if you face the problem when convert the model, the lib should be libmmdeploy_onnxruntime_ops.so
sess = ort.InferenceSession('/path/to/onnx', session_options, ['CPUExecutionProvider'])
input = np.random.randn(1, 3, 224, 224).astype(np.float32) # b, c, h, w
output = sess.run(['output'], input_feed={'input': input})
NicolasPetermann134 commented 1 year ago

Hi Chen Xin,

Many thx for your reply. I can verify that I can run both models mmdet and mmpose with pure onnxruntime api!

But I got indeed a loading error with custom operator library:

Traceback (most recent call last): File "/home/nico/PycharmProjects/deepGym/mmdeploy_old/pure_ort_api.py", line 14, in session_options.register_custom_ops_library(path_custom_ops) # if you face the problem when convert the model, the lib should be libmmdeploy_onnxruntime_ops.so onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Failed to load library /home/nico/miniconda3/envs/openmmlab/lib/python3.10/site-packages/mmdeploy_runtime/libmmdeploy_ort_net.so with error: libmmdeploy.so.1: cannot open shared object file: No such file or directory

But the file exists.

Any idea how I could fix it within your sdk?

NicolasPetermann134 commented 1 year ago

Just saw that you recommend to install onnxruntime-1.8.1 in #2191 . I'm using Version: 1.15.1. Unforutnately, I cannot install 1.8.1:

pip install onnxruntime==1.8.1

ERROR: Could not find a version that satisfies the requirement onnxruntime==1.8.1 (from versions: 1.12.0, 1.12.1, 1.13.1, 1.14.0, 1.14.1, 1.15.0, 1.15.1) ERROR: No matching distribution found for onnxruntime==1.8.1

irexyc commented 1 year ago

@NicolasPetermann134

https://pypi.org/project/onnxruntime/1.8.1/#files

The onnxruntime 1.8.1 only support python in [3.6, 3.9]

I'm facing the same issue. Would appreciate any help a lot.

Are you using c sdk? What device are you passing?

NicolasPetermann134 commented 1 year ago

Oh ok, will downgrade to python 3.9 then.

No, I'm using python sdk. What do you mean by what device passing? I'm following the guide from the RTMPose site: https://github.com/open-mmlab/mmpose/tree/1.x/projects/rtmpose#%EF%B8%8F-how-to-deploy-

NicolasPetermann134 commented 1 year ago

FYI: I still get this error even if I run the sdk with python 3.8 and onnxruntime 1.8.1

irexyc commented 1 year ago

@NicolasPetermann134

Sorry for late reply, could you print LD_LIBRARY_PATH environment variable ?

NicolasPetermann134 commented 1 year ago

@irexyc

echo $LD_LIBRARY_PATH /home/nico/PycharmProjects/deepGym/mmdeploy/onnxruntime-linux-x64-1.8.1/lib:/home/nico/PycharmProjects/deepGym/mmdeploy/onnxruntime-linux-x64-1.8.1/lib:/home/nico/PycharmProjects/deepGym/mmdeploy/onnxruntime-linux-x64-gpu-1.8.1/lib:/home/nico/PycharmProjects/deepGym/mmdeploy/onnxruntime-linux-x64-gpu-1.8.1/lib:/home/nico/PycharmProjects/deepGym/mmdeploy/onnxruntime-linux-x64-gpu-1.8.1/lib:/home/nico/PycharmProjects/deepGym/mmdeploy/onnxruntime-linux-x64-1.8.1/lib:/home/nico/PycharmProjects/deepGym/mmdeploy/cuda/lib64:/home/nico/PycharmProjects/deepGym/mmdeploy/TensorRT-8.2.3.0/lib:

NicolasPetermann134 commented 1 year ago

@irexyc Does that tell you anything? Should it look different?

irexyc commented 1 year ago

@NicolasPetermann134 Sorry for late reply. The path looks ok.

In your previous reply you said

I can verify that I can run both models mmdet and mmpose with pure onnxruntime api!

Have you tried if this code work?

import onnxruntime as ort
import numpy as np
session_options = ort.SessionOptions()
session_options.register_custom_ops_library('/path/to/libmmdeploy_ort_net.so') # if you face the problem when convert the model, the lib should be libmmdeploy_onnxruntime_ops.so
sess = ort.InferenceSession('/path/to/onnx', session_options, ['CPUExecutionProvider'])
input = np.random.randn(1, 3, 224, 224).astype(np.float32) # b, c, h, w
output = sess.run(['output'], input_feed={'input': input})

If the above code doesn't work, the python sdk will not work either and the problem shoud be loading onnxruntime custom ops library. Then we can try to build custom ops library with newer onnxruntime library.

If the above code works, there should be problem with other part in sdk.

NicolasPetermann134 commented 1 year ago

@irexyc

Yes, I've tried that code and it doesn't work. I've got this error:

Traceback (most recent call last): File "/home/nico/PycharmProjects/deepGym/mmdeploy_old/pure_ort_api.py", line 14, in session_options.register_custom_ops_library(path_custom_ops) # if you face the problem when convert the model, the lib should be libmmdeploy_onnxruntime_ops.so onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Failed to load library /home/nico/miniconda3/envs/openmmlab/lib/python3.10/site-packages/mmdeploy_runtime/libmmdeploy_ort_net.so with error: libmmdeploy.so.1: cannot open shared object file: No such file or directory

irexyc commented 1 year ago

@NicolasPetermann134 Below is my mmdeploy_runtime installation content, what is yours?

/home/chenxin/miniconda3/envs/torch-1.9.0/lib/python3.8/site-packages/mmdeploy_runtime/
├── __init__.py
├── libmmdeploy_ort_net.so
├── libmmdeploy.so.1
├── libonnxruntime.so.1.8.1
├── mmdeploy_runtime.cpython-38-x86_64-linux-gnu.so
├── __pycache__
├── version.py
└── _win_dll_path.py
NicolasPetermann134 commented 1 year ago

@irexyc
Really sry for the late reply I was abroad.

tree /home/nico/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy_runtime /home/nico/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy_runtime ├── init.py ├── libmmdeploy_ort_net.so ├── libmmdeploy.so.1 ├── libonnxruntime.so.1.8.1 ├── mmdeploy_runtime.cpython-38-x86_64-linux-gnu.so ├── pycache │   ├── init.cpython-38.pyc │   ├── version.cpython-38.pyc │   └── _win_dll_path.cpython-38.pyc ├── version.py └── _win_dll_path.py

1 directory, 10 files

irexyc commented 1 year ago

Yes, I've tried that code and it doesn't work. I've got this error:

Traceback (most recent call last): File "/home/nico/PycharmProjects/deepGym/mmdeploy_old/pure_ort_api.py", line 14, in session_options.register_custom_ops_library(path_custom_ops) # if you face the problem when convert the model, the lib should be libmmdeploy_onnxruntime_ops.so onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Failed to load library /home/nico/miniconda3/envs/openmmlab/lib/python3.10/site-packages/mmdeploy_runtime/libmmdeploy_ort_net.so with error: libmmdeploy.so.1: cannot open shared object file: No such file or directory

What is the content of pure_ort_api.py ?

tree /home/nico/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy_runtime

You list mmdeploy_runtime content of python3.8. Are you using python 3.8 or 3.10 ?

NicolasPetermann134 commented 1 year ago

@irexyc In pure_ort_api.py is your code to do the testing:

import onnxruntime as ort import numpy as np path_custom_ops = "/home/nico/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy_runtime/libmmdeploy_ort_net.so" model_path = 'rtmpose-ort_orig/rtmpose-m/end2end.onnx'

sess = ort.InferenceSession(model_path, None, ['CPUExecutionProvider']) input = np.random.randn(1, 3, 256, 192).astype(np.float32) # b, c, h, w output = sess.run(['simcc_y'], input_feed={'input': input})

session_options = ort.SessionOptions() session_options.register_custom_ops_library(path_custom_ops) # if you face the problem when convert the model, the lib should be libmmdeploy_onnxruntime_ops.so sess = ort.InferenceSession(model_path, session_options, ['CPUExecutionProvider']) input = np.random.randn(1, 3, 256, 192).astype(np.float32) # b, c, h, w output = sess.run(['simcc_y'], input_feed={'input': input})

To your second question: I started with 3.10 but switched to 3.8. The above code is now running, no error anymore with 3.8. But the SDK error still remains:

(mmdeploy) nico@nico-Z690-AORUS-MASTER:~/PycharmProjects/deepGym/mmdeploy$ build/bin/pose_tracker rtmpose-ort_orig/rtmdet-nano/ rtmpose-ort_orig/rtmpose-m/ /home/nico/Downloads/test_video.mp4 --device cpu --det_interval 5 [2023-07-31 21:31:37.306] [mmdeploy] [info] [model.cpp:35] [DirectoryModel] Load model: "rtmpose-ort_orig/rtmpose-m/" [2023-07-31 21:31:37.306] [mmdeploy] [info] [model.cpp:35] [DirectoryModel] Load model: "rtmpose-ort_orig/rtmdet-nano/" [2023-07-31 21:31:37.365] [mmdeploy] [info] [inference.cpp:54] ["img"] <- ["data"] [2023-07-31 21:31:37.365] [mmdeploy] [info] [inference.cpp:65] ["post_output"] -> ["dets"] [2023-07-31 21:31:37.451] [mmdeploy] [info] [inference.cpp:54] ["img"] <- ["rois"] [2023-07-31 21:31:37.451] [mmdeploy] [info] [inference.cpp:65] ["post_output"] -> ["keypoints"] /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_vector.h:1042: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = int; _Alloc = std::allocator; std::vector<_Tp, _Alloc>::reference = int&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__builtin_expect(__n < this->size(), true)' failed. Aborted (core dumped)

irexyc commented 1 year ago

Sorry for late reply.

The above code is now running, no error anymore with 3.8

What the onnxruntime version are you using ? Could you confirm both onnxruntime 1.8.1 and 1.15.1 work?

NicolasPetermann134 commented 1 year ago

@irexyc Yes, your testing code runs now without error but the error with SDK still remains.

irexyc commented 1 year ago

@NicolasPetermann134 Sorry to bother you again. I'm a little messed up now, I want to make it more clear.

  1. Can pure onnxruntime 1.8.1(python api) (without load libmmdeploy_ort_net.so) could do inference without error ?

  2. Can pure onnxruntime 1.8.1(python api) (with load libmmdeploy_ort_net.so) could do inference without error ?

  3. Can pure onnxruntime 1.15.1(python api) (without load libmmdeploy_ort_net.so) could do inference without error ?

  4. Can pure onnxruntime 1.15.1(python api) (with load libmmdeploy_ort_net.so) could do inference without error ?