open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.78k stars 636 forks source link

[Bug] engine is not None, 'Failed to create TensorRT engine' #1908

Closed makangzhe closed 1 year ago

makangzhe commented 1 year ago

Checklist

Describe the bug

when i use FP16 config , it's normal . but i use INT8 config, wrong happened, please help me

Reproduction

CUDA_VISIBLE_DEVICES=3 python3 mmdeploy-master/tools/deploy.py mmdeploy-master/configs/mmseg/segmentation_tensorrt-int8_static-512x512.py mmsegmentation-master/idcard_config/deeplabv3plus_r18-d8_512x512_ps_v2.py weights/deeplab_v3_mosaic_2-1.pth mmsegmentation-master/demo/demo.png --work-dir mmdeploy_model/deeplab-v3-int8 --device cuda:0 --dump-info

Environment

2023-03-23 15:27:16,036 - mmdeploy - INFO - **********Environmental information**********
fatal: not a git repository (or any of the parent directories): .git
2023-03-23 15:27:16,496 - mmdeploy - INFO - sys.platform: linux
2023-03-23 15:27:16,497 - mmdeploy - INFO - Python: 3.6.13 |Anaconda, Inc.| (default, Jun  4 2021, 14:25:59) [GCC 7.5.0]
2023-03-23 15:27:16,497 - mmdeploy - INFO - CUDA available: True
2023-03-23 15:27:16,497 - mmdeploy - INFO - GPU 0,1,2,3,4,5,6,7: A100-PCIE-40GB
2023-03-23 15:27:16,497 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda
2023-03-23 15:27:16,498 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 11.1, V11.1.105
2023-03-23 15:27:16,498 - mmdeploy - INFO - GCC: gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
2023-03-23 15:27:16,498 - mmdeploy - INFO - PyTorch: 1.8.2+cu111
2023-03-23 15:27:16,498 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

2023-03-23 15:27:16,498 - mmdeploy - INFO - TorchVision: 0.9.2+cu111
2023-03-23 15:27:16,498 - mmdeploy - INFO - OpenCV: 4.6.0
2023-03-23 15:27:16,499 - mmdeploy - INFO - MMCV: 1.6.0
2023-03-23 15:27:16,499 - mmdeploy - INFO - MMCV Compiler: GCC 7.3
2023-03-23 15:27:16,499 - mmdeploy - INFO - MMCV CUDA Compiler: 11.1
2023-03-23 15:27:16,499 - mmdeploy - INFO - MMDeploy: 0.12.0+
2023-03-23 15:27:16,499 - mmdeploy - INFO - 

2023-03-23 15:27:16,499 - mmdeploy - INFO - **********Backend information**********
2023-03-23 15:27:16,617 - mmdeploy - INFO - tensorrt:   8.2.3.0
2023-03-23 15:27:16,618 - mmdeploy - INFO - tensorrt custom ops:        Available
2023-03-23 15:27:16,619 - mmdeploy - INFO - ONNXRuntime:        None
2023-03-23 15:27:16,619 - mmdeploy - INFO - pplnn:      None
2023-03-23 15:27:16,621 - mmdeploy - INFO - ncnn:       None
2023-03-23 15:27:16,624 - mmdeploy - INFO - snpe:       None
2023-03-23 15:27:16,626 - mmdeploy - INFO - openvino:   None
2023-03-23 15:27:16,629 - mmdeploy - INFO - torchscript:        1.8.2+cu111
2023-03-23 15:27:16,630 - mmdeploy - INFO - torchscript custom ops:     NotAvailable
2023-03-23 15:27:16,702 - mmdeploy - INFO - rknn-toolkit:       None
2023-03-23 15:27:16,702 - mmdeploy - INFO - rknn2-toolkit:      None
2023-03-23 15:27:16,703 - mmdeploy - INFO - ascend:     None
2023-03-23 15:27:16,704 - mmdeploy - INFO - coreml:     None
2023-03-23 15:27:16,705 - mmdeploy - INFO - tvm:        None
2023-03-23 15:27:16,705 - mmdeploy - INFO - 

2023-03-23 15:27:16,705 - mmdeploy - INFO - **********Codebase information**********
2023-03-23 15:27:16,707 - mmdeploy - INFO - mmdet:      None
2023-03-23 15:27:16,707 - mmdeploy - INFO - mmseg:      0.25.0
2023-03-23 15:27:16,707 - mmdeploy - INFO - mmcls:      0.25.0
2023-03-23 15:27:16,707 - mmdeploy - INFO - mmocr:      None
2023-03-23 15:27:16,707 - mmdeploy - INFO - mmedit:     None
2023-03-23 15:27:16,707 - mmdeploy - INFO - mmdet3d:    None
2023-03-23 15:27:16,707 - mmdeploy - INFO - mmpose:     None
2023-03-23 15:27:16,707 - mmdeploy - INFO - mmrotate:   None
2023-03-23 15:27:16,707 - mmdeploy - INFO - mmaction:   None

Error traceback

[03/23/2023-15:22:59] [TRT] [I]   Calibrated batch 1950 in 0.133268 seconds.
[03/23/2023-15:22:59] [TRT] [I]   Calibrated batch 1951 in 0.13315 seconds.
[03/23/2023-15:23:00] [TRT] [I]   Calibrated batch 1952 in 0.133595 seconds.
[03/23/2023-15:23:00] [TRT] [I]   Calibrated batch 1953 in 0.133264 seconds.
[03/23/2023-15:23:00] [TRT] [I]   Calibrated batch 1954 in 0.133336 seconds.
[03/23/2023-15:23:00] [TRT] [I]   Calibrated batch 1955 in 0.133323 seconds.
[03/23/2023-15:23:00] [TRT] [I]   Calibrated batch 1956 in 0.134092 seconds.
[03/23/2023-15:23:17] [TRT] [I]   Post Processing Calibration data in 16.5936 seconds.
[03/23/2023-15:23:17] [TRT] [I] Calibration completed in 336.825 seconds.
[03/23/2023-15:23:17] [TRT] [I] Writing Calibration Cache for calibrator: TRT-8203-EntropyCalibration2
[03/23/2023-15:23:21] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.3.0
[03/23/2023-15:23:21] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 2431, GPU 2187 (MiB)
[03/23/2023-15:23:21] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2431, GPU 2195 (MiB)
[03/23/2023-15:23:21] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.4
[03/23/2023-15:23:21] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[03/23/2023-15:23:30] [TRT] [E] 1: Unexpected exception 
Process Process-4:
Traceback (most recent call last):
  File "/usr/local/conda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/conda/lib/python3.6/site-packages/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/usr/local/conda/lib/python3.6/site-packages/mmdeploy/apis/utils/utils.py", line 101, in to_backend
    **kwargs)
  File "/usr/local/conda/lib/python3.6/site-packages/mmdeploy/backend/tensorrt/backend_manager.py", line 136, in to_backend
    partition_type=partition_type)
  File "/usr/local/conda/lib/python3.6/site-packages/mmdeploy/backend/tensorrt/onnx2tensorrt.py", line 88, in onnx2tensorrt
    device_id=device_id)
  File "/usr/local/conda/lib/python3.6/site-packages/mmdeploy/backend/tensorrt/utils.py", line 233, in from_onnx
    assert engine is not None, 'Failed to create TensorRT engine'
AssertionError: Failed to create TensorRT engine
2023-03-23 15:23:33,043 - mmdeploy - ERROR - `mmdeploy.apis.utils.utils.to_backend` with Call id: 2 failed. exit.
makangzhe commented 1 year ago

base = ['./segmentation_static.py', '../base/backends/tensorrt-int8.py']

onnx_config = dict(input_shape=[512, 512]) backend_config = dict( common_config=dict(max_workspace_size=1 << 30), model_inputs=[ dict( input_shapes=dict( input=dict( min_shape=[1, 3, 512, 512], opt_shape=[1, 3, 512, 512], max_shape=[1, 3, 512, 512]))) ])

makangzhe commented 1 year ago

base = ['./segmentation_static.py', '../base/backends/tensorrt-int8.py']

onnx_config = dict(input_shape=[512, 512]) backend_config = dict( common_config=dict(max_workspace_size=1 << 30), model_inputs=[ dict( input_shapes=dict( input=dict( min_shape=[1, 3, 512, 512], opt_shape=[1, 3, 512, 512], max_shape=[1, 3, 512, 512]))) ])

RunningLeon commented 1 year ago

@makangzhe Hi, cannot get useful info from error log. Could you try run following script and post error log here?

CUDA_VISIBLE_DEVICES=3 \
python3 mmdeploy-master/tools/onnx2tensorrt.py \
mmdeploy-master/configs/mmseg/segmentation_tensorrt-int8_static-512x512.py \
mmdeploy_model/deeplab-v3-int8/end2end.onnx \
mmdeploy_model/deeplab-v3-int8/end2end \
github-actions[bot] commented 1 year ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions[bot] commented 1 year ago

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.