open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.79k stars 639 forks source link

[Bug] Yoloxpose模型导出为 TensorRT 格式错误 #2500

Closed wenkaiH closed 1 year ago

wenkaiH commented 1 year ago

Checklist

Describe the bug

10/17 15:23:31 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized. 10/17 15:23:31 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "mmpose_tasks" registry tree. As a workaround, the current "mmpose_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized. 10/17 15:23:33 - mmengine - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess 10/17 15:23:34 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized. 10/17 15:23:34 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "mmpose_tasks" registry tree. As a workaround, the current "mmpose_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized. /home/ccnu-train/anaconda3/envs/mmdeploy/lib/python3.8/site-packages/mmpose/datasets/datasets/utils.py:102: UserWarning: The metainfo config file "configs/base/datasets/coco.py" does not exist. A matched config file "/home/ccnu-train/anaconda3/envs/mmdeploy/lib/python3.8/site-packages/mmpose/.mim/configs/base/datasets/coco.py" will be used instead. warnings.warn( Loads checkpoint by http backend from path: https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/yolox_pose/yoloxpose_m_8xb32-300e_coco-640-84e9a538_20230829.pth 10/17 15:23:42 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 10/17 15:23:42 - mmengine - INFO - Export PyTorch model to ONNX: mmdeploy-model/yoloxpose-trt/end2end.onnx. 10/17 15:23:43 - mmengine - WARNING - Can not find torch.nn.functional.scaled_dot_product_attention, function rewrite will not be applied 10/17 15:23:43 - mmengine - WARNING - Can not find torch._C._jit_pass_onnx_autograd_function_process, function rewrite will not be applied 10/17 15:23:43 - mmengine - WARNING - Can not find torch._C._jit_pass_onnx_deduplicate_initializers, function rewrite will not be applied 10/17 15:23:43 - mmengine - WARNING - Can not find mmdet.models.utils.transformer.PatchMerging.forward, function rewrite will not be applied /home/ccnu-train/anaconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] /home/ccnu-train/hwk/mmdeploy/mmdeploy/core/optimizers/function_marker.py:160: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! ys_shape = tuple(int(s) for s in ys.shape) /home/ccnu-train/hwk/mmdeploy/mmdeploy/mmcv/ops/nms.py:475: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! int(scores.shape[-1]), /home/ccnu-train/hwk/mmdeploy/mmdeploy/mmcv/ops/nms.py:149: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! out_boxes = min(num_boxes, after_topk) WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. /home/ccnu-train/anaconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/onnx/symbolic_opset9.py:2815: UserWarning: Exporting aten::index operator of advanced indexing in opset 11 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results. warnings.warn("Exporting aten::index operator of advanced indexing in opset " + WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. 10/17 15:23:58 - mmengine - INFO - Execute onnx optimize passes. 10/17 15:23:58 - mmengine - INFO - Finish pipeline mmdeploy.apis.pytorch2onnx.torch2onnx 10/17 15:24:01 - mmengine - INFO - Start pipeline mmdeploy.apis.utils.utils.to_backend in subprocess 10/17 15:24:01 - mmengine - WARNING - Could not load the library of tensorrt plugins. Because the file does not exist: [10/17/2023-15:24:02] [TRT] [I] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 101, GPU 18991 (MiB) [10/17/2023-15:24:05] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +546, GPU +118, now: CPU 702, GPU 19109 (MiB) [10/17/2023-15:24:05] [TRT] [I] ---------------------------------------------------------------- [10/17/2023-15:24:05] [TRT] [I] Input filename: mmdeploy-model/yoloxpose-trt/end2end.onnx [10/17/2023-15:24:05] [TRT] [I] ONNX IR version: 0.0.7 [10/17/2023-15:24:05] [TRT] [I] Opset version: 11 [10/17/2023-15:24:05] [TRT] [I] Producer name: pytorch [10/17/2023-15:24:05] [TRT] [I] Producer version: 1.10 [10/17/2023-15:24:05] [TRT] [I] Domain: [10/17/2023-15:24:05] [TRT] [I] Model version: 0 [10/17/2023-15:24:05] [TRT] [I] Doc string: [10/17/2023-15:24:05] [TRT] [I] ---------------------------------------------------------------- [10/17/2023-15:24:05] [TRT] [W] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [10/17/2023-15:24:05] [TRT] [W] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped [10/17/2023-15:24:06] [TRT] [I] No importer registered for op: TRTBatchedNMS. Attempting to import as plugin. [10/17/2023-15:24:06] [TRT] [I] Searching for plugin: TRTBatchedNMS, plugin_version: 1, plugin_namespace: [10/17/2023-15:24:06] [TRT] [E] ModelImporter.cpp:726: While parsing node number 656 [TRTBatchedNMS -> "1741"]: [10/17/2023-15:24:06] [TRT] [E] ModelImporter.cpp:727: --- Begin node --- [10/17/2023-15:24:06] [TRT] [E] ModelImporter.cpp:728: input: "1740" input: "1722" output: "1741" output: "1742" output: "1743" name: "TRTBatchedNMS_656" op_type: "TRTBatchedNMS" attribute { name: "background_label_id" i: -1 type: INT } attribute { name: "clip_boxes" i: 0 type: INT } attribute { name: "iou_threshold" f: 0.65 type: FLOAT } attribute { name: "is_normalized" i: 0 type: INT } attribute { name: "keep_topk" i: 100 type: INT } attribute { name: "num_classes" i: 1 type: INT } attribute { name: "return_index" i: 1 type: INT } attribute { name: "score_threshold" f: 0.5 type: FLOAT } attribute { name: "topk" i: 5000 type: INT } domain: "mmdeploy"

[10/17/2023-15:24:06] [TRT] [E] ModelImporter.cpp:729: --- End node --- [10/17/2023-15:24:06] [TRT] [E] ModelImporter.cpp:732: ERROR: builtin_op_importers.cpp:5428 In function importFallbackPluginImporter: [8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?" Process Process-3: Traceback (most recent call last): File "/home/ccnu-train/anaconda3/envs/mmdeploy/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/ccnu-train/anaconda3/envs/mmdeploy/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, *self._kwargs) File "/home/ccnu-train/hwk/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in call ret = func(args, **kwargs) File "/home/ccnu-train/hwk/mmdeploy/mmdeploy/apis/utils/utils.py", line 98, in to_backend return backend_mgr.to_backend( File "/home/ccnu-train/hwk/mmdeploy/mmdeploy/backend/tensorrt/backend_manager.py", line 127, in to_backend onnx2tensorrt( File "/home/ccnu-train/hwk/mmdeploy/mmdeploy/backend/tensorrt/onnx2tensorrt.py", line 79, in onnx2tensorrt from_onnx( File "/home/ccnu-train/hwk/mmdeploy/mmdeploy/backend/tensorrt/utils.py", line 185, in from_onnx raise RuntimeError(f'Failed to parse onnx, {error_msgs}') RuntimeError: Failed to parse onnx, In node 656 (importFallbackPluginImporter): UNSUPPORTED_NODE: Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

Reproduction

python tools/deploy.py \ configs/mmpose/pose-detection_yolox-pose_tensorrt_dynamic-640x640.py \ ../mmpose/configs/body_2d_keypoint/yoloxpose/coco/yoloxpose_m_8xb32-300e_coco-640.py \ https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/yolox_pose/yoloxpose_m_8xb32-300e_coco-640-84e9a538_20230829.pth \ demo/resources/human-pose.jpg \ --work-dir mmdeploy-model/yoloxpose-trt \ --device cuda \ --show \ --dump-info

Environment

10/17 15:50:12 - mmengine - INFO -

10/17 15:50:12 - mmengine - INFO - **********Environmental information**********
10/17 15:50:13 - mmengine - INFO - sys.platform: linux
10/17 15:50:13 - mmengine - INFO - Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]
10/17 15:50:13 - mmengine - INFO - CUDA available: True
10/17 15:50:13 - mmengine - INFO - numpy_random_seed: 2147483648
10/17 15:50:13 - mmengine - INFO - GPU 0,1: NVIDIA RTX A6000
10/17 15:50:13 - mmengine - INFO - CUDA_HOME: /usr
10/17 15:50:13 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.5, V11.5.119
10/17 15:50:13 - mmengine - INFO - GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
10/17 15:50:13 - mmengine - INFO - PyTorch: 1.10.2+cu113
10/17 15:50:13 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

10/17 15:50:13 - mmengine - INFO - TorchVision: 0.11.3+cu113
10/17 15:50:13 - mmengine - INFO - OpenCV: 4.8.1
10/17 15:50:13 - mmengine - INFO - MMEngine: 0.8.4
10/17 15:50:13 - mmengine - INFO - MMCV: 2.0.1
10/17 15:50:13 - mmengine - INFO - MMCV Compiler: GCC 9.3
10/17 15:50:13 - mmengine - INFO - MMCV CUDA Compiler: 11.3
10/17 15:50:13 - mmengine - INFO - MMDeploy: 1.3.0+c4dc10d
10/17 15:50:13 - mmengine - INFO -

10/17 15:50:13 - mmengine - INFO - **********Backend information**********
10/17 15:50:13 - mmengine - INFO - tensorrt:    8.5.3.1
10/17 15:50:13 - mmengine - INFO - tensorrt custom ops: NotAvailable
10/17 15:50:13 - mmengine - INFO - ONNXRuntime: None
10/17 15:50:13 - mmengine - INFO - ONNXRuntime-gpu:     1.15.1
10/17 15:50:13 - mmengine - INFO - ONNXRuntime custom ops:      NotAvailable
10/17 15:50:13 - mmengine - INFO - pplnn:       None
10/17 15:50:13 - mmengine - INFO - ncnn:        None
10/17 15:50:13 - mmengine - INFO - snpe:        None
10/17 15:50:13 - mmengine - INFO - openvino:    None
10/17 15:50:13 - mmengine - INFO - torchscript: 1.10.2+cu113
10/17 15:50:13 - mmengine - INFO - torchscript custom ops:      NotAvailable
10/17 15:50:13 - mmengine - INFO - rknn-toolkit:        None
10/17 15:50:13 - mmengine - INFO - rknn-toolkit2:       None
10/17 15:50:13 - mmengine - INFO - ascend:      None
10/17 15:50:13 - mmengine - INFO - coreml:      None
10/17 15:50:13 - mmengine - INFO - tvm: None
10/17 15:50:13 - mmengine - INFO - vacc:        None
10/17 15:50:13 - mmengine - INFO -

10/17 15:50:13 - mmengine - INFO - **********Codebase information**********
10/17 15:50:13 - mmengine - INFO - mmdet:       3.0.0
10/17 15:50:13 - mmengine - INFO - mmseg:       None
10/17 15:50:13 - mmengine - INFO - mmpretrain:  None
10/17 15:50:13 - mmengine - INFO - mmocr:       None
10/17 15:50:13 - mmengine - INFO - mmagic:      None
10/17 15:50:13 - mmengine - INFO - mmdet3d:     None
10/17 15:50:13 - mmengine - INFO - mmpose:      1.2.0
10/17 15:50:13 - mmengine - INFO - mmrotate:    None
10/17 15:50:13 - mmengine - INFO - mmaction:    None
10/17 15:50:13 - mmengine - INFO - mmrazor:     None
10/17 15:50:13 - mmengine - INFO - mmyolo:      0.5.0

Error traceback

No response

wenkaiH commented 1 year ago

还有一个小问题,在此之前,我已成功将rtmpose部署导出为TensorRT 格式,且成功测试推理 (所以应该不是环境设置的问题吧),Yoloxpose部署导出onnx格式成功,但只能使用API推理测试,无法使用SDK推理。

[2023-10-17 15:59:07.788] [mmdeploy] [info] [model.cpp:35] [DirectoryModel] Load model: "mmdeploy-model/yoloxpose-ort/" [2023-10-17 15:59:08.195] [mmdeploy] [error] [compose.cpp:37] Unable to find Transform creator: BottomupResize. Available transforms: [("CenterCrop", 0), ("Collect", 0), ("Compose", 0), ("DefaultFormatBundle", 0), ("FormatShape", 0), ("ImageToTensor", 0), ("LetterResize", 0), ("Lift", 0), ("LoadImageFromFile", 0), ("Normalize", 0), ("Pad", 0), ("RescaleToHeight", 0), ("Resize", 0), ("ResizeOCR", 0), ("ShortScaleAspectJitter", 0), ("TenCrop", 0), ("ThreeCrop", 0), ("TopDownAffine", 0), ("TopDownGetBboxCenterScale", 0)] [2023-10-17 15:59:08.196] [mmdeploy] [error] [task.cpp:99] error parsing config: { "context": { "device": "", "model": "", "stream": "" }, "input": [ "img" ], "module": "Transform", "name": "Preprocess", "output": [ "prep_output" ], "transforms": [ { "type": "LoadImageFromFile" }, { "input_size": [ 640, 640 ], "pad_val": [ 114, 114, 114 ], "type": "BottomupResize" }, { "mean": [ 0, 0, 0 ], "std": [ 1, 1, 1 ], "to_rgb": false, "type": "Normalize" }, { "keys": [ "img" ], "type": "ImageToTensor" }, { "keys": [ "img" ], "meta_keys": [ "img_shape", "pad_shape", "ori_shape", "img_norm_cfg", "scale_factor", "bbox_score", "center", "scale" ], "type": "Collect" } ], "type": "Task" } [2023-10-17 15:59:08.732] [mmdeploy] [error] [common.h:50] Could not found entry 'UNKNOWN' in mmpose. Available components: [("DeepposeRegressionHeadDecode", 0), ("SimCCLabelDecode", 0), ("TopdownHeatmapBaseHeadDecode", 0), ("TopdownHeatmapMSMUHeadDecode", 0), ("TopdownHeatmapMultiStageHeadDecode", 0), ("TopdownHeatmapSimpleHeadDecode", 0), ("ViPNASHeatmapSimpleHeadDecode", 0)] [2023-10-17 15:59:08.732] [mmdeploy] [error] [task.cpp:99] error parsing config: { "component": "UNKNOWN", "context": { "device": "", "model": "", "stream": "" }, "input": [ "prep_output", "infer_output" ], "module": "mmpose", "name": "postprocess", "output": [ "post_output" ], "params": { "flip_test": false, "input_size": [ 640, 640 ], "nms_thr": 0.65, "score_thr": 0.5, "type": "YOLOXPoseAnnotationProcessor" }, "type": "Task" } 段错误 (核心已转储)

RunningLeon commented 1 year ago

@wenkaiH hi, yolox-pose is not supported in sdk. You can try this PR: https://github.com/open-mmlab/mmdeploy/pull/2240

wenkaiH commented 1 year ago

@wenkaiH hi, yolox-pose is not supported in sdk. You can try this PR: #2240

已经尝试更改过里面提及的二十几个文件并 执行pip install -e mmdeploy,但仍然失败了

RunningLeon commented 1 year ago

you need to rebuild mmdeploy.

wenkaiH commented 1 year ago

you need to rebuild mmdeploy.

不是通过 pip install -e {dir}/mmdeploy 重新构建嘛

RunningLeon commented 1 year ago

Yes. you have build from source: https://mmdeploy.readthedocs.io/en/latest/01-how-to-build/build_from_source.html

github-actions[bot] commented 1 year ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions[bot] commented 1 year ago

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.