[Bug] Converting in int8 fails

DaraOrange commented 1 year ago

Checklist

[X] I have searched related issues but cannot get the expected help.
[X] 2. I have read the FAQ documentation but cannot get the expected help.
[ ] 3. The bug has not been fixed in the latest version.

Describe the bug

I'm trying to convert my model to int8, but it fails because of Softmax (it is not implemented in int8). I'm trying to enable PREFER_PRECISION_CONSTRAINTS flag for builder, but it has no effect. What should I do to automatically fallback such layers in fp16 (converting all model to fp16 is correct)?

Reproduction

MODEL="mask_rcnn_internimage_t_fpn_3x_coco" CKPT_PATH="/home/dara-orange/workdir/scripts/checkpoints/mask_rcnn_internimage_t_fpn_3x_coco.pth"

python deploy.py \ "./deploy/configs/mmdet/instance-seg/instance-seg_tensorrt_dynamic-320x320-1344x1344.py" \ "./configs/coco/${MODEL}.py" \ "${CKPT_PATH}" \ "../../../data/Retinaface/many_people_fast_frames/1.jpg" \ --work-dir "../checkpoints/${MODEL}_no_mask_int8.trt" \ --device cuda \ --dump-info \ --quant

Environment

/home/dara-orange/anaconda3/envs/internimage_deploy_py38/lib/python3.8/site-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  warnings.warn(
2023-05-11 17:45:27,665 - mmdeploy - INFO - 

2023-05-11 17:45:27,665 - mmdeploy - INFO - **********Environmental information**********
2023-05-11 17:45:28,070 - mmdeploy - INFO - sys.platform: linux
2023-05-11 17:45:28,070 - mmdeploy - INFO - Python: 3.8.16 (default, Mar  2 2023, 03:21:46) [GCC 11.2.0]
2023-05-11 17:45:28,070 - mmdeploy - INFO - CUDA available: True
2023-05-11 17:45:28,071 - mmdeploy - INFO - GPU 0,1,2,3,4,5,6,7: Tesla V100-PCIE-32GB
2023-05-11 17:45:28,071 - mmdeploy - INFO - CUDA_HOME: /
2023-05-11 17:45:28,071 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 11.7, V11.7.99
2023-05-11 17:45:28,071 - mmdeploy - INFO - GCC: gcc (Ubuntu 9.4.0-1ubuntu1~16.04) 9.4.0
2023-05-11 17:45:28,071 - mmdeploy - INFO - PyTorch: 1.11.0+cu113
2023-05-11 17:45:28,071 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

2023-05-11 17:45:28,071 - mmdeploy - INFO - TorchVision: 0.12.0+cu113
2023-05-11 17:45:28,071 - mmdeploy - INFO - OpenCV: 4.7.0
2023-05-11 17:45:28,071 - mmdeploy - INFO - MMCV: 1.7.1
2023-05-11 17:45:28,071 - mmdeploy - INFO - MMCV Compiler: GCC 9.3
2023-05-11 17:45:28,071 - mmdeploy - INFO - MMCV CUDA Compiler: 11.3
2023-05-11 17:45:28,071 - mmdeploy - INFO - MMDeploy: 0.14.0+335ef86
2023-05-11 17:45:28,071 - mmdeploy - INFO - 

2023-05-11 17:45:28,071 - mmdeploy - INFO - **********Backend information**********
2023-05-11 17:45:28,132 - mmdeploy - INFO - tensorrt:   8.6.0
2023-05-11 17:45:28,132 - mmdeploy - INFO - tensorrt custom ops:        Available
2023-05-11 17:45:28,136 - mmdeploy - INFO - ONNXRuntime:        None
2023-05-11 17:45:28,139 - mmdeploy - INFO - pplnn:      None
2023-05-11 17:45:28,144 - mmdeploy - INFO - ncnn:       None
2023-05-11 17:45:28,149 - mmdeploy - INFO - snpe:       None
2023-05-11 17:45:28,151 - mmdeploy - INFO - openvino:   None
2023-05-11 17:45:28,153 - mmdeploy - INFO - torchscript:        1.11.0+cu113
2023-05-11 17:45:28,153 - mmdeploy - INFO - torchscript custom ops:     NotAvailable
2023-05-11 17:45:28,200 - mmdeploy - INFO - rknn-toolkit:       None
2023-05-11 17:45:28,200 - mmdeploy - INFO - rknn2-toolkit:      None
2023-05-11 17:45:28,203 - mmdeploy - INFO - ascend:     None
2023-05-11 17:45:28,205 - mmdeploy - INFO - coreml:     None
2023-05-11 17:45:28,207 - mmdeploy - INFO - tvm:        None
2023-05-11 17:45:28,208 - mmdeploy - INFO - 

2023-05-11 17:45:28,208 - mmdeploy - INFO - **********Codebase information**********
2023-05-11 17:45:28,210 - mmdeploy - INFO - mmdet:      2.28.2
2023-05-11 17:45:28,210 - mmdeploy - INFO - mmseg:      None
2023-05-11 17:45:28,210 - mmdeploy - INFO - mmcls:      None
2023-05-11 17:45:28,210 - mmdeploy - INFO - mmocr:      None
2023-05-11 17:45:28,210 - mmdeploy - INFO - mmedit:     None
2023-05-11 17:45:28,210 - mmdeploy - INFO - mmdet3d:    None
2023-05-11 17:45:28,210 - mmdeploy - INFO - mmpose:     None
2023-05-11 17:45:28,210 - mmdeploy - INFO - mmrotate:   None
2023-05-11 17:45:28,210 - mmdeploy - INFO - mmaction:   None

Error traceback

[05/11/2023-16:43:15] [TRT] [I] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32.
[05/11/2023-16:43:15] [TRT] [W] The CUDA context changed between createInferBuilder and buildSerializedNetwork. A Builder holds CUDA resources which cannot be shared across CUDA contexts, so access these in different CUDA context results in undefined behavior. If using pycuda, try import pycuda.autoinit before importing tensorrt.
[05/11/2023-16:43:23] [TRT] [I] Graph optimization time: 0.427641 seconds.
[05/11/2023-16:43:23] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +216, GPU +98, now: CPU 1355, GPU 3928 (MiB)
[05/11/2023-16:43:23] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +103, GPU +98, now: CPU 1458, GPU 4026 (MiB)
[05/11/2023-16:43:23] [TRT] [W] TensorRT was linked against cuDNN 8.8.0 but loaded cuDNN 8.0.0
[05/11/2023-16:43:23] [TRT] [W] BuilderFlag::kENABLE_TACTIC_HEURISTIC has been ignored in this builder run. This feature is only supported on Ampere and beyond.
[05/11/2023-16:43:23] [TRT] [I] Timing cache disabled. Turning it on will improve builder speed.
[05/11/2023-16:43:23] [TRT] [W] Calibration Profile is not defined. Calibrating with Profile 0
[05/11/2023-16:43:28] [TRT] [E] 10: Could not find any implementation for node Softmax_85.
[05/11/2023-16:43:28] [TRT] [E] 10: [optimizer.cpp::computeCosts::3873] Error Code 10: Internal Error (Could not find any implementation for node Softmax_85.)

grimoire commented 1 year ago

Int8 and fp16 can be enabled at the same time. No extra flags are required.

DaraOrange commented 1 year ago

But why does problem with softmax appear then?

grimoire commented 1 year ago

Enlarge the max workspace size might help.

DaraOrange commented 1 year ago

My config:

backend_config = dict(
    type='tensorrt',
    common_config=dict(
        fp16_mode=True, int8_mode=True, max_workspace_size=2147483648),
    model_inputs=[
        dict(
            input_shapes=dict(
                input=dict(
                    min_shape=[1, 3, 320, 320],
                    opt_shape=[1, 3, 800, 1344],
                    max_shape=[1, 3, 1344, 1344])))
    ])
calib_config = dict(create_calib=True, calib_file='calib_data.h5')

I think, workspace size is maximum (1<<31)

grimoire commented 1 year ago

internimage is extremely large. I am afraid 2Gb workspace might not be enough.

DaraOrange commented 1 year ago

I try to convert tiny version of InternImage. What do you think, which size is enough?

DaraOrange commented 1 year ago

I even tried 1<<50, the same problem with softmax

github-actions[bot] commented 1 year ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions[bot] commented 1 year ago

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

xuweidongkobe commented 1 year ago

I have meet the same problem. Enlarge the max workspace size is not help. Is the problem solved?Can you give me some advice, thanks. @DaraOrange @RunningLeon @grimoire

open-mmlab / mmdeploy