open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.77k stars 636 forks source link

[Bug] deploy.py on 1.x branch fails due to VACC dependency #1836

Open xduris1 opened 1 year ago

xduris1 commented 1 year ago

Checklist

Describe the bug

During running the deploy.py script with the prompt: python tools/deploy.py \ configs/mmdet/instance-seg/instance-seg_rtmdet-ins_tensorrt_static-640x640.py \ ${PATH_TO_MMDET}/configs/rtmdet/rtmdet-ins_s_8xb32-300e_coco.py \ checkpoint/rtmdet-ins_s_8xb32-300e_coco/rtmdet-ins_s_8xb32-300e_coco_20221121_212604-fdc5d7ec.pth \ demo/resources/det.jpg \ --work-dir ./work_dirs/rtmdet-ins \ --device cuda:0 \ --show the script fails with due to unhadled dependecy on VACC in enums.

Possible fix: Comment lines 229 - 253 in deploy.py until issue is resolved. Then the script finishes successfully.

Reproduction

run python tools/deploy.py \ configs/mmdet/instance-seg/instance-seg_rtmdet-ins_tensorrt_static-640x640.py \ ${PATH_TO_MMDET}/configs/rtmdet/rtmdet-ins_s_8xb32-300e_coco.py \ checkpoint/rtmdet-ins_s_8xb32-300e_coco/rtmdet-ins_s_8xb32-300e_coco_20221121_212604-fdc5d7ec.pth \ demo/resources/det.jpg \ --work-dir ./work_dirs/rtmdet-ins \ --device cuda:0 \ --show on branch 1.x of mmdeploy.

Environment

03/06 10:01:34 - mmengine - INFO - 

03/06 10:01:34 - mmengine - INFO - **********Environmental information**********
03/06 10:01:35 - mmengine - INFO - sys.platform: linux
03/06 10:01:35 - mmengine - INFO - Python: 3.8.16 (default, Jan 17 2023, 23:13:24) [GCC 11.2.0]
03/06 10:01:35 - mmengine - INFO - CUDA available: True
03/06 10:01:35 - mmengine - INFO - numpy_random_seed: 2147483648
03/06 10:01:35 - mmengine - INFO - GPU 0: NVIDIA TITAN RTX
03/06 10:01:35 - mmengine - INFO - CUDA_HOME: /usr/local/cuda
03/06 10:01:35 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.6, V11.6.124
03/06 10:01:35 - mmengine - INFO - GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
03/06 10:01:35 - mmengine - INFO - PyTorch: 1.10.0
03/06 10:01:35 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2022.1-Product Build 20220311 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

03/06 10:01:35 - mmengine - INFO - TorchVision: 0.11.0
03/06 10:01:35 - mmengine - INFO - OpenCV: 4.7.0
03/06 10:01:35 - mmengine - INFO - MMEngine: 0.6.0
03/06 10:01:35 - mmengine - INFO - MMCV: 2.0.0rc4
03/06 10:01:35 - mmengine - INFO - MMCV Compiler: GCC 9.3
03/06 10:01:35 - mmengine - INFO - MMCV CUDA Compiler: 11.3
03/06 10:01:35 - mmengine - INFO - MMDeploy: 1.0.0rc3+ec4abe5
03/06 10:01:35 - mmengine - INFO - 

03/06 10:01:35 - mmengine - INFO - **********Backend information**********
03/06 10:01:35 - mmengine - INFO - tensorrt:    8.2.4.2
03/06 10:01:35 - mmengine - INFO - tensorrt custom ops: Available
03/06 10:01:35 - mmengine - INFO - ONNXRuntime: None
03/06 10:01:35 - mmengine - INFO - ONNXRuntime-gpu:     1.8.1
03/06 10:01:35 - mmengine - INFO - ONNXRuntime custom ops:      NotAvailable
03/06 10:01:35 - mmengine - INFO - pplnn:       None
03/06 10:01:35 - mmengine - INFO - ncnn:        None
03/06 10:01:35 - mmengine - INFO - snpe:        None
03/06 10:01:35 - mmengine - INFO - openvino:    None
03/06 10:01:35 - mmengine - INFO - torchscript: 1.10.0
03/06 10:01:35 - mmengine - INFO - torchscript custom ops:      NotAvailable
03/06 10:01:35 - mmengine - INFO - rknn-toolkit:        None
03/06 10:01:35 - mmengine - INFO - rknn-toolkit2:       None
03/06 10:01:35 - mmengine - INFO - ascend:      None
03/06 10:01:35 - mmengine - INFO - coreml:      None
03/06 10:01:35 - mmengine - INFO - tvm: None
03/06 10:01:35 - mmengine - INFO - 

03/06 10:01:35 - mmengine - INFO - **********Codebase information**********
03/06 10:01:35 - mmengine - INFO - mmdet:       3.0.0rc5
03/06 10:01:35 - mmengine - INFO - mmseg:       None
03/06 10:01:35 - mmengine - INFO - mmcls:       None
03/06 10:01:35 - mmengine - INFO - mmocr:       None
03/06 10:01:35 - mmengine - INFO - mmedit:      None
03/06 10:01:35 - mmengine - INFO - mmdet3d:     None
03/06 10:01:35 - mmengine - INFO - mmpose:      None
03/06 10:01:35 - mmengine - INFO - mmrotate:    None
03/06 10:01:35 - mmengine - INFO - mmaction:    None

Error traceback

Traceback (most recent call last):
  File "tools/deploy.py", line 335, in <module>
    main()
  File "tools/deploy.py", line 229, in main
    if backend == Backend.VACC:
  File "/opt/conda/lib/python3.8/enum.py", line 384, in __getattr__
    raise AttributeError(name) from None
AttributeError: VACC
lvhan028 commented 1 year ago

May try to reinstall mmdeploy

cd mmdeploy
pip install -e -v .
xduris1 commented 1 year ago

I tried reinstalling mmdeploy using pip install -e -v . on branch 1.x , commit f69c636 version of mmdeploy is 1.0.0rc3 After reintallation I am getting this error:

[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
Process Process-3:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/data/rtmdet_conversion/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/data/rtmdet_conversion/mmdeploy/mmdeploy/apis/utils/utils.py", line 98, in to_backend
    return backend_mgr.to_backend(
  File "/data/rtmdet_conversion/mmdeploy/mmdeploy/backend/tensorrt/backend_manager.py", line 127, in to_backend
    onnx2tensorrt(
  File "/data/rtmdet_conversion/mmdeploy/mmdeploy/backend/tensorrt/onnx2tensorrt.py", line 79, in onnx2tensorrt
    from_onnx(
  File "/data/rtmdet_conversion/mmdeploy/mmdeploy/backend/tensorrt/utils.py", line 180, in from_onnx
    raise RuntimeError(f'Failed to parse onnx, {error_msgs}')
RuntimeError: Failed to parse onnx, In node 383 (importFallbackPluginImporter): UNSUPPORTED_NODE: Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

03/06 11:48:09 - mmengine - ERROR - /data/rtmdet_conversion/mmdeploy/mmdeploy/apis/core/pipeline_manager.py - pop_mp_output - 80 - `mmdeploy.apis.utils.utils.to_backend` with Call id: 1 failed. exit.

I think this is because the installation from repository does not contain plugins for Tensorrt conversion. These are provided in the installation from https://github.com/open-mmlab/mmdeploy/releases/download/v1.0.0rc3/mmdeploy-1.0.0rc3-linux-x86_64-cuda11.1-tensorrt8.2.3.0.tar.gz however this is the version with the formentioned VACC issue,

lvhan028 commented 1 year ago

I built mmdeploy from source. and install it by pip install -e . In this way, I didn't reproduce your issue. Are you suggesting the reproduced way as follows?

1. install mmdeploy from prebuilt package
2. do model conversion
xduris1 commented 1 year ago

Yes, I was following tutorial from the MMDet RTMDET config section: https://github.com/open-mmlab/mmdetection/tree/3.x/configs/rtmdet I was installing for Tensorrt conversion.

More specifically I installed everything according to Step1. Install MMDeploy, then checked that the condition from the section Deploy RTMDet Instance Segmentation Model MMDeploy >= v1.0.0rc2 is True.

flyzxm5177 commented 1 year ago

I had the same problem: WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. 03/06 22:04:09 - mmengine - INFO - Execute onnx optimize passes. 03/06 22:04:09 - mmengine - WARNING - Can not optimize model, please build torchscipt extension. More details: https://github.com/open-mmlab/mmdeploy/tree/1.x/docs/en/experimental/onnx_optimizer.md 03/06 22:04:09 - mmengine - INFO - Finish pipeline mmdeploy.apis.pytorch2onnx.torch2onnx Traceback (most recent call last): File "mmdeploy/tools/deploy.py", line 335, in main() File "mmdeploy/tools/deploy.py", line 229, in main if backend == Backend.VACC: File "/HOME/scz5563/.conda/envs/mmdet/lib/python3.8/enum.py", line 384, in getattr raise AttributeError(name) from None AttributeError: VACC

but in order to deploy on cpu: python mmdeploy/tools/deploy.py \ mmdeploy/configs/mmdet/instance-seg/instance-seg_rtmdet-ins_onnxruntime_static-640x640.py \ mmdetection/configs/rtmdet/rtmdet-ins_tiny_8xb32-300e_coco.py \ checkpoints/rtmdet-ins_tiny_8xb32-300e_coco_20221130_151727-ec670f7e.pth \ mmdetection/demo/demo.jpg \ --work-dir ./work_dirs/rtmdet-ins \ --device cpu\

waitting for solution...

xduris1 commented 1 year ago

@flyzxm5177 I mentioned workaround In the initial post.

Possible fix:
Comment lines 229 - 253 in deploy.py until issue is resolved.
Then the script finishes successfully.

I was able to convert to tensorrt successfully after this, However I do not consider this a good fix. Hope this might help you as a temporary workaround for the time being.

flyzxm5177 commented 1 year ago

It works, thank you!

@flyzxm5177 I mentioned workaround In the initial post.

Possible fix:
Comment lines 229 - 253 in deploy.py until issue is resolved.
Then the script finishes successfully.

I was able to convert to tensorrt successfully after this, However I do not consider this a good fix. Hope this might help you as a temporary workaround for the time being.