open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.79k stars 637 forks source link

TRT failed to set binding dimension #472

Closed austinmw closed 2 years ago

austinmw commented 2 years ago

Hi, I ran the following TRT conversion to produce an end2end.engine file:

python /root/workspace/mmdeploy/tools/deploy.py \
/root/workspace/mmdeploy/configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py \
/mmdetection/configs/yolox/yolox_s_8x8_300e_coco.py \ 
https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth \
/mmdetection/demo/demo.jpg \
--device cuda:0

However when I try to load this in Triton I get the error:

failed to load 'yolox' version 1: Internal: trt failed to set binding dimension to [1,3,1344,1344] for input 'input' for yolox_0

Do I need a specific config.pbtxt file to go along with this?

Also, not sure if related, but when trying to add a config.pbtxt with a max_batch_size: 4 I get the error:

model_repository_manager.cc:1234] failed to load 'yolox' version 1: Internal: autofill failed for model 'yolox': configuration specified max-batch 4 but TensorRT engine only supports max-batch 1

(Apologies for creating multiple Q's while trying to get this YOLOX model working in Triton)

grimoire commented 2 years ago

Normally, binding dimension failure means you are given a tensor with a shape outside the limit of the profile(the one you used to convert the model, read base_tensorrt_dynamic-320x320-1344x1344.py for more detail). theoretically, [1,3,1344,1344] won't violate the limit. I am not sure why would this happen (Sorry, I have no experience with the Triton server.)

If you want to deploy a model with multi-batch support. You also need to edit the profile in the config files:

input_shapes=dict(
                input=dict(
                    min_shape=[1, 3, 320, 320],
                    opt_shape=[1, 3, 800, 1344],
                    max_shape=[4, 3, 1344, 1344]))    # set the max batch size in the profile.
austinmw commented 2 years ago

Thanks, it seems that I get a binding error with a size that matches whatever I use as the config max_shape height and width. I changed my config to this:

backend_config = dict(
    common_config=dict(max_workspace_size=1 << 33),
    model_inputs=[
        dict(
            input_shapes=dict(
                input=dict(
                    min_shape=[1, 3, 416, 416],
                    opt_shape=[1, 3, 640, 640],
                    max_shape=[8, 3, 960, 960])))
    ])

And now I get:

error: creating server: Invalid argument - load failed for model 'yolox': version 1: Internal: trt failed to set binding dimension to [8,3,960,960] for input 'input' for yolox_0;

austinmw commented 2 years ago

Hi, here's a summary of my latest attempted inputs/outputs if someone gets a chance to help:

_base_ = ['../_base_/base_dynamic.py', '../../_base_/backends/tensorrt.py']

backend_config = dict(
    common_config=dict(max_workspace_size=1 << 33),
    model_inputs=[
        dict(
            input_shapes=dict(
                input=dict(
                    min_shape=[1, 3, 160, 160],
                    opt_shape=[8, 3, 416, 416],
                    max_shape=[8, 3, 640, 640])))
    ])
[I] Loading plugin library: /root/workspace/mmdeploy/build/lib/libmmdeploy_tensorrt_ops.so
[I] Loading bytes from /volume_share/end2end.engine
[I] ==== TensorRT Engine ====
    Name: Unnamed Network 0 | Explicit Batch Engine

    ---- 1 Engine Input(s) ----
    {input [dtype=float32, shape=(-1, 3, -1, -1)]}

    ---- 2 Engine Output(s) ----
    {dets [dtype=float32, shape=(-1, 100, 5)],
     labels [dtype=int32, shape=(-1, 100)]}

    ---- Memory ----
    Device Memory: 3294651904 bytes

    ---- 1 Profile(s) (3 Binding(s) Each) ----
    - Profile: 0
        Binding Index: 0 (Input)  [Name: input]  | Shapes: min=(1, 3, 160, 160), opt=(8, 3, 416, 416), max=(8, 3, 640, 640)
        Binding Index: 1 (Output) [Name: dets]   | Shape: (-1, 100, 5)
        Binding Index: 2 (Output) [Name: labels] | Shape: (-1, 100)

    ---- 267 Layer(s) ----
name: "yolox"
default_model_filename: "end2end.engine"
platform: "tensorrt_plan"

input [{
    name: "input"
    data_type: TYPE_FP32
    format: FORMAT_NONE
    dims: [3, -1, -1]
}]   

output [
  {
    name: "dets"
    data_type: TYPE_FP32
    dims: [100, 5]
  },    
  {
    name: "labels"
    data_type: TYPE_INT32
    dims: [100]
  }    
]

instance_group {
  count: 1
  kind: KIND_GPU
}    
max_batch_size: 8
dynamic_batching {
}   
LD_PRELOAD=libmmdeploy_tensorrt_ops.so tritonserver --allow-sagemaker=true --allow-grpc=false --allow-http=false --allow-metrics=false --model-control-mode=explicit $SAGEMAKER_ARGS

Yet I still end up with following error:

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 22.04 (build 36821869)
Triton Server Version 2.21.0

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 11.6 driver version 510.47.03 with kernel driver version 450.142.00.
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

I0526 23:57:56.842470 90 libtorch.cc:1381] TRITONBACKEND_Initialize: pytorch
I0526 23:57:56.842620 90 libtorch.cc:1391] Triton TRITONBACKEND API version: 1.9
I0526 23:57:56.842648 90 libtorch.cc:1397] 'pytorch' TRITONBACKEND API version: 1.9
2022-05-26 23:57:57.067249: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0526 23:57:57.113716 90 tensorflow.cc:2181] TRITONBACKEND_Initialize: tensorflow
I0526 23:57:57.113763 90 tensorflow.cc:2191] Triton TRITONBACKEND API version: 1.9
I0526 23:57:57.113787 90 tensorflow.cc:2197] 'tensorflow' TRITONBACKEND API version: 1.9
I0526 23:57:57.113804 90 tensorflow.cc:2221] backend configuration:
{}
I0526 23:57:57.115651 90 onnxruntime.cc:2400] TRITONBACKEND_Initialize: onnxruntime
I0526 23:57:57.115679 90 onnxruntime.cc:2410] Triton TRITONBACKEND API version: 1.9
I0526 23:57:57.115693 90 onnxruntime.cc:2416] 'onnxruntime' TRITONBACKEND API version: 1.9
I0526 23:57:57.115702 90 onnxruntime.cc:2446] backend configuration:
{}
I0526 23:57:57.138422 90 openvino.cc:1207] TRITONBACKEND_Initialize: openvino
I0526 23:57:57.138464 90 openvino.cc:1217] Triton TRITONBACKEND API version: 1.9
I0526 23:57:57.138482 90 openvino.cc:1223] 'openvino' TRITONBACKEND API version: 1.9
I0526 23:57:59.045217 90 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f717c000000' with size 268435456
I0526 23:57:59.045655 90 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0526 23:57:59.047689 90 model_repository_manager.cc:1077] loading: yolox:1
I0526 23:57:59.148744 90 tensorrt.cc:5294] TRITONBACKEND_Initialize: tensorrt
I0526 23:57:59.148796 90 tensorrt.cc:5304] Triton TRITONBACKEND API version: 1.9
I0526 23:57:59.148829 90 tensorrt.cc:5310] 'tensorrt' TRITONBACKEND API version: 1.9
I0526 23:57:59.148967 90 tensorrt.cc:5353] backend configuration:
{}
I0526 23:57:59.149008 90 tensorrt.cc:5405] TRITONBACKEND_ModelInitialize: yolox (version 1)
I0526 23:57:59.541930 90 logging.cc:49] [MemUsageChange] Init CUDA: CPU +252, GPU +0, now: CPU 1411, GPU 1013 (MiB)
I0526 23:57:59.612098 90 logging.cc:49] Loaded engine size: 37 MiB
I0526 23:58:00.262596 90 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +385, GPU +180, now: CPU 1883, GPU 1231 (MiB)
I0526 23:58:00.451435 90 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +117, GPU +52, now: CPU 2000, GPU 1283 (MiB)
I0526 23:58:00.452936 90 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +35, now: CPU 0, GPU 35 (MiB)
I0526 23:58:00.480981 90 tensorrt.cc:5454] TRITONBACKEND_ModelInstanceInitialize: yolox_0 (GPU device 0)
I0526 23:58:00.482134 90 logging.cc:49] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1913, GPU 1229 (MiB)
I0526 23:58:00.551111 90 logging.cc:49] Loaded engine size: 37 MiB
I0526 23:58:00.597552 90 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 2000, GPU 1275 (MiB)
I0526 23:58:00.599643 90 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2000, GPU 1283 (MiB)
I0526 23:58:00.601113 90 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +35, now: CPU 0, GPU 35 (MiB)
I0526 23:58:00.608293 90 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 1924, GPU 1275 (MiB)
I0526 23:58:00.610127 90 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 1925, GPU 1283 (MiB)
E0526 23:58:00.666938 90 logging.cc:43] 1: [raiiMyelinGraph.h::RAIIMyelinGraph::24] Error Code 1: Myelin (Compiled against cuDNN 10.2.2.0 but running against cuDNN 11.9.3.0.)
I0526 23:58:00.666989 90 logging.cc:49] Could not set default profile 0 for execution context. Profile index must be set explicitly.
I0526 23:58:00.667040 90 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +2754, now: CPU 0, GPU 2789 (MiB)
E0526 23:58:00.667103 90 logging.cc:43] 3: [executionContext.cpp::setBindingDimensions::926] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::926, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
)
I0526 23:58:00.667145 90 tensorrt.cc:5492] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0526 23:58:00.667178 90 tensorrt.cc:5431] TRITONBACKEND_ModelFinalize: delete model state
E0526 23:58:00.668017 90 model_repository_manager.cc:1234] failed to load 'yolox' version 1: Internal: trt failed to set binding dimension to [8,3,640,640] for input 'input' for yolox_0
I0526 23:58:00.668301 90 tritonserver.cc:2123] 
+----------------------------------+------------------------------------------+
| Option                           | Value                                    |
+----------------------------------+------------------------------------------+
| server_id                        | triton                                   |
| server_version                   | 2.21.0                                   |
| server_extensions                | classification sequence model_repository |
|                                  |  model_repository(unload_dependents) sch |
|                                  | edule_policy model_configuration system_ |
|                                  | shared_memory cuda_shared_memory binary_ |
|                                  | tensor_data statistics trace             |
| model_repository_path[0]         | /opt/ml/model/                           |
| model_control_mode               | MODE_EXPLICIT                            |
| startup_models_0                 | yolox                                    |
| strict_model_config              | 0                                        |
| rate_limit                       | OFF                                      |
| pinned_memory_pool_byte_size     | 268435456                                |
| cuda_memory_pool_byte_size{0}    | 67108864                                 |
| response_cache_byte_size         | 0                                        |
| min_supported_compute_capability | 6.0                                      |
| strict_readiness                 | 1                                        |
| exit_timeout                     | 30                                       |
+----------------------------------+------------------------------------------+

I0526 23:58:00.668354 90 server.cc:247] No server context available. Exiting immediately.
error: creating server: Invalid argument - load failed for model 'yolox': version 1: Internal: trt failed to set binding dimension to [8,3,640,640] for input 'input' for yolox_0;

Any help is greatly appreciated!

grimoire commented 2 years ago

Is there any way to set the profile? There is an error log:

condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
austinmw commented 2 years ago

It turned out to be a Myelin graph cudnn compatibility error that I was not able to see until testing with trtexec inference. So I guess moral of the story for me is to try trtexec verbose inference before attempting to load into Triton. Thanks!