[Question] Setting dynamic batching with warmup

austinmw commented 2 years ago

Hi, I'm trying to deploy an MMDetection yolox-s model which I converted to an end2end.engine file using MMDeploy. I added the libmmdeploy_tensorrt_ops.so that was used to generate my engine file into my Triton docker image and use LD_PRELOAD.

I attempted to set my config.pbtxt like this:

name: "yolox"
platform: "tensorrt_plan"
max_batch_size: 8
input {
  name: "input"
  data_type: TYPE_FP32
  dims: [ 3, 800, 1344 ]
}
output [
  {
    name: "dets"
    data_type: TYPE_FP32
    dims: [ 100, 5 ]
  },    
  {
    name: "labels"
    data_type: TYPE_INT32
    dims: [ 100 ]
  }    
]
instance_group {
  count: 1
  kind: KIND_GPU
}

dynamic_batching {
}

model_warmup {
    name: "warmup"
    batch_size: 8
    inputs: {
        key: "input"
        value: {
            data_type: TYPE_FP32
            dims: [ 3, 800, 1344 ]
            zero_data: false
        }
    }
}

default_model_filename: "end2end.engine"

However this gives me warmup errors:

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 22.04 (build 36821869)
Triton Server Version 2.21.0

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 11.6 driver version 510.47.03 with kernel driver version 450.142.00.
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

WARNING: No SAGEMAKER_TRITON_DEFAULT_MODEL_NAME provided.
         Starting with the only existing model directory yolox
I0513 00:13:21.821294 91 libtorch.cc:1381] TRITONBACKEND_Initialize: pytorch
I0513 00:13:21.821390 91 libtorch.cc:1391] Triton TRITONBACKEND API version: 1.9
I0513 00:13:21.821411 91 libtorch.cc:1397] 'pytorch' TRITONBACKEND API version: 1.9
2022-05-13 00:13:22.037748: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0513 00:13:22.082952 91 tensorflow.cc:2181] TRITONBACKEND_Initialize: tensorflow
I0513 00:13:22.082989 91 tensorflow.cc:2191] Triton TRITONBACKEND API version: 1.9
I0513 00:13:22.083006 91 tensorflow.cc:2197] 'tensorflow' TRITONBACKEND API version: 1.9
I0513 00:13:22.083026 91 tensorflow.cc:2221] backend configuration:
{}
I0513 00:13:22.084785 91 onnxruntime.cc:2400] TRITONBACKEND_Initialize: onnxruntime
I0513 00:13:22.084817 91 onnxruntime.cc:2410] Triton TRITONBACKEND API version: 1.9
I0513 00:13:22.084839 91 onnxruntime.cc:2416] 'onnxruntime' TRITONBACKEND API version: 1.9
I0513 00:13:22.084853 91 onnxruntime.cc:2446] backend configuration:
{}
I0513 00:13:22.106429 91 openvino.cc:1207] TRITONBACKEND_Initialize: openvino
I0513 00:13:22.106457 91 openvino.cc:1217] Triton TRITONBACKEND API version: 1.9
I0513 00:13:22.106485 91 openvino.cc:1223] 'openvino' TRITONBACKEND API version: 1.9
I0513 00:13:24.007565 91 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f756c000000' with size 268435456
I0513 00:13:24.008099 91 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0513 00:13:24.010115 91 model_repository_manager.cc:1077] loading: yolox:1
I0513 00:13:24.111009 91 tensorrt.cc:5294] TRITONBACKEND_Initialize: tensorrt
I0513 00:13:24.111048 91 tensorrt.cc:5304] Triton TRITONBACKEND API version: 1.9
I0513 00:13:24.111070 91 tensorrt.cc:5310] 'tensorrt' TRITONBACKEND API version: 1.9
I0513 00:13:24.111160 91 tensorrt.cc:5353] backend configuration:
{}
I0513 00:13:24.111208 91 tensorrt.cc:5405] TRITONBACKEND_ModelInitialize: yolox (version 1)
I0513 00:13:24.112760 91 tensorrt.cc:5454] TRITONBACKEND_ModelInstanceInitialize: yolox_0 (GPU device 0)
I0513 00:13:24.492754 91 logging.cc:49] [MemUsageChange] Init CUDA: CPU +252, GPU +0, now: CPU 1411, GPU 1013 (MiB)
I0513 00:13:24.528970 91 logging.cc:49] Loaded engine size: 21 MiB
I0513 00:13:25.179791 91 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +385, GPU +178, now: CPU 1854, GPU 1215 (MiB)
I0513 00:13:25.356389 91 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +116, GPU +54, now: CPU 1970, GPU 1269 (MiB)
I0513 00:13:25.357843 91 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +18, now: CPU 0, GPU 18 (MiB)
I0513 00:13:25.359230 91 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 1927, GPU 1261 (MiB)
I0513 00:13:25.360743 91 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1927, GPU 1269 (MiB)
I0513 00:13:25.369938 91 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +85, now: CPU 0, GPU 103 (MiB)
I0513 00:13:25.370398 91 tensorrt.cc:1411] Created instance yolox_0 on GPU 0 with stream priority 0 and optimization profile default[0];
E0513 00:13:25.373251 91 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 00:13:25.373287 91 tensorrt.cc:1993] error setting the binding dimension
E0513 00:13:25.373327 91 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 00:13:25.373347 91 tensorrt.cc:1993] error setting the binding dimension
E0513 00:13:25.373376 91 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 00:13:25.373397 91 tensorrt.cc:1993] error setting the binding dimension
E0513 00:13:25.373430 91 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 00:13:25.373443 91 tensorrt.cc:1993] error setting the binding dimension
E0513 00:13:25.373466 91 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 00:13:25.373480 91 tensorrt.cc:1993] error setting the binding dimension
E0513 00:13:25.373494 91 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 00:13:25.373518 91 tensorrt.cc:1993] error setting the binding dimension
E0513 00:13:25.373546 91 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 00:13:25.373562 91 tensorrt.cc:1993] error setting the binding dimension
E0513 00:13:25.373583 91 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 00:13:25.373603 91 tensorrt.cc:1993] error setting the binding dimension
I0513 00:13:25.373817 91 model_repository_manager.cc:1231] successfully loaded 'yolox' version 1
I0513 00:13:25.373923 91 server.cc:549] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0513 00:13:25.374066 91 server.cc:576] 
+-------------+------------------------------------------------------+--------+
| Backend     | Path                                                 | Config |
+-------------+------------------------------------------------------+--------+
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch | {}     |
|             | .so                                                  |        |
| tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_ten | {}     |
|             | sorflow1.so                                          |        |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onn | {}     |
|             | xruntime.so                                          |        |
| openvino    | /opt/tritonserver/backends/openvino_2021_4/libtriton | {}     |
|             | _openvino_2021_4.so                                  |        |
| tensorrt    | /opt/tritonserver/backends/tensorrt/libtriton_tensor | {}     |
|             | rt.so                                                |        |
+-------------+------------------------------------------------------+--------+

I0513 00:13:25.374118 91 server.cc:619] 
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
| yolox | 1       | READY  |
+-------+---------+--------+

I0513 00:13:25.374242 91 tritonserver.cc:2123] 
+----------------------------------+------------------------------------------+
| Option                           | Value                                    |
+----------------------------------+------------------------------------------+
| server_id                        | triton                                   |
| server_version                   | 2.21.0                                   |
| server_extensions                | classification sequence model_repository |
|                                  |  model_repository(unload_dependents) sch |
|                                  | edule_policy model_configuration system_ |
|                                  | shared_memory cuda_shared_memory binary_ |
|                                  | tensor_data statistics trace             |
| model_repository_path[0]         | /opt/ml/model/                           |
| model_control_mode               | MODE_EXPLICIT                            |
| startup_models_0                 | yolox                                    |
| strict_model_config              | 1                                        |
| rate_limit                       | OFF                                      |
| pinned_memory_pool_byte_size     | 268435456                                |
| cuda_memory_pool_byte_size{0}    | 67108864                                 |
| response_cache_byte_size         | 0                                        |
| min_supported_compute_capability | 6.0                                      |
| strict_readiness                 | 1                                        |
| exit_timeout                     | 30                                       |
+----------------------------------+------------------------------------------+

I0513 00:13:25.374711 91 sagemaker_server.cc:136] Started Sagemaker HTTPService at 0.0.0.0:8080

Can anyone tell me how I should be setting each dimension to make use of dynamic batching? I've attempted several different combinations of values, but can't seem to get it right.

nv-kmcgill53 commented 2 years ago

Hi @austinmw Can you please provide the full command you are using to run tritonserver? This might be a case where you are using the --strict-model-config=false command line parameter and the autocomplete feature is being opinionated about the dimensions of your input. If this is the case, you can try changing this parameter to --strict-model-config=true.

To better view what model configuration is being produced, you can use the --log-verbose=1 parameter to be more verbose.

austinmw commented 2 years ago

@nv-kmcgill53 Thanks, sure:

LD_PRELOAD=libmmdeploy_tensorrt_ops.so tritonserver --allow-sagemaker=true --allow-grpc=false --allow-http=false --allow-metrics=false --model-control-mode=explicit $SAGEMAKER_ARGS

And in this case SAGEMAKER_ARGS is an empty string. With --strict-model-config=true and --log-verbose=1 I get this output:

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 22.04 (build 36821869)
Triton Server Version 2.21.0

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 11.6 driver version 510.47.03 with kernel driver version 450.142.00.
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

WARNING: No SAGEMAKER_TRITON_DEFAULT_MODEL_NAME provided.
         Starting with the only existing model directory yolox
I0513 02:16:16.258937 92 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/pytorch/libtriton_pytorch.so
I0513 02:16:16.620907 92 libtorch.cc:1381] TRITONBACKEND_Initialize: pytorch
I0513 02:16:16.620943 92 libtorch.cc:1391] Triton TRITONBACKEND API version: 1.9
I0513 02:16:16.620968 92 libtorch.cc:1397] 'pytorch' TRITONBACKEND API version: 1.9
I0513 02:16:16.621055 92 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so
2022-05-13 02:16:16.839091: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0513 02:16:16.884327 92 tensorflow.cc:2181] TRITONBACKEND_Initialize: tensorflow
I0513 02:16:16.884373 92 tensorflow.cc:2191] Triton TRITONBACKEND API version: 1.9
I0513 02:16:16.884381 92 tensorflow.cc:2197] 'tensorflow' TRITONBACKEND API version: 1.9
I0513 02:16:16.884396 92 tensorflow.cc:2221] backend configuration:
{}
I0513 02:16:16.884473 92 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so
I0513 02:16:16.886438 92 onnxruntime.cc:2400] TRITONBACKEND_Initialize: onnxruntime
I0513 02:16:16.886469 92 onnxruntime.cc:2410] Triton TRITONBACKEND API version: 1.9
I0513 02:16:16.886483 92 onnxruntime.cc:2416] 'onnxruntime' TRITONBACKEND API version: 1.9
I0513 02:16:16.886497 92 onnxruntime.cc:2446] backend configuration:
{}
I0513 02:16:16.899768 92 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/openvino_2021_4/libtriton_openvino_2021_4.so
I0513 02:16:16.909598 92 openvino.cc:1207] TRITONBACKEND_Initialize: openvino
I0513 02:16:16.909627 92 openvino.cc:1217] Triton TRITONBACKEND API version: 1.9
I0513 02:16:16.909645 92 openvino.cc:1223] 'openvino' TRITONBACKEND API version: 1.9
I0513 02:16:18.751969 92 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fdfa0000000' with size 268435456
I0513 02:16:18.752475 92 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0513 02:16:18.754073 92 model_config_utils.cc:648] Server side auto-completed config: name: "yolox"
platform: "tensorrt_plan"
max_batch_size: 8
input {
  name: "input"
  data_type: TYPE_FP32
  dims: 3
  dims: 800
  dims: 1344
}
output {
  name: "dets"
  data_type: TYPE_FP32
  dims: 100
  dims: 5
}
output {
  name: "labels"
  data_type: TYPE_INT32
  dims: 100
}
instance_group {
  count: 1
  kind: KIND_GPU
}
default_model_filename: "end2end.engine"
dynamic_batching {
}
model_warmup {
  name: "warmup"
  batch_size: 8
  inputs {
    key: "input"
    value {
      data_type: TYPE_FP32
      dims: 3
      dims: 800
      dims: 1344
      zero_data: false
    }
  }
}
backend: "tensorrt"

I0513 02:16:18.754647 92 model_repository_manager.cc:1077] loading: yolox:1
I0513 02:16:18.855046 92 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
I0513 02:16:18.855761 92 tensorrt.cc:5294] TRITONBACKEND_Initialize: tensorrt
I0513 02:16:18.855792 92 tensorrt.cc:5304] Triton TRITONBACKEND API version: 1.9
I0513 02:16:18.855815 92 tensorrt.cc:5310] 'tensorrt' TRITONBACKEND API version: 1.9
I0513 02:16:18.855826 92 tensorrt.cc:5333] Registering TensorRT Plugins
I0513 02:16:18.855856 92 logging.cc:52] Plugin creator already registered - ::BatchTilePlugin_TRT version 1
I0513 02:16:18.855880 92 logging.cc:52] Plugin creator already registered - ::BatchedNMS_TRT version 1
I0513 02:16:18.855905 92 logging.cc:52] Plugin creator already registered - ::BatchedNMSDynamic_TRT version 1
I0513 02:16:18.855918 92 logging.cc:52] Plugin creator already registered - ::CoordConvAC version 1
I0513 02:16:18.855934 92 logging.cc:52] Plugin creator already registered - ::CropAndResize version 1
I0513 02:16:18.855955 92 logging.cc:52] Plugin creator already registered - ::CropAndResizeDynamic version 1
I0513 02:16:18.855968 92 logging.cc:52] Plugin creator already registered - ::DecodeBbox3DPlugin version 1
I0513 02:16:18.855991 92 logging.cc:52] Plugin creator already registered - ::DetectionLayer_TRT version 1
I0513 02:16:18.856003 92 logging.cc:52] Plugin creator already registered - ::EfficientNMS_TRT version 1
I0513 02:16:18.856027 92 logging.cc:52] Plugin creator already registered - ::EfficientNMS_ONNX_TRT version 1
I0513 02:16:18.856050 92 logging.cc:52] Plugin creator already registered - ::EfficientNMS_Explicit_TF_TRT version 1
I0513 02:16:18.856063 92 logging.cc:52] Plugin creator already registered - ::EfficientNMS_Implicit_TF_TRT version 1
I0513 02:16:18.856079 92 logging.cc:52] Plugin creator already registered - ::FlattenConcat_TRT version 1
I0513 02:16:18.856102 92 logging.cc:52] Plugin creator already registered - ::GenerateDetection_TRT version 1
I0513 02:16:18.856125 92 logging.cc:52] Plugin creator already registered - ::GridAnchor_TRT version 1
I0513 02:16:18.856156 92 logging.cc:52] Plugin creator already registered - ::GridAnchorRect_TRT version 1
I0513 02:16:18.856177 92 logging.cc:52] Plugin creator already registered - ::InstanceNormalization_TRT version 1
I0513 02:16:18.856201 92 logging.cc:52] Plugin creator already registered - ::LReLU_TRT version 1
I0513 02:16:18.856223 92 logging.cc:52] Plugin creator already registered - ::MultilevelCropAndResize_TRT version 1
I0513 02:16:18.856242 92 logging.cc:52] Plugin creator already registered - ::MultilevelProposeROI_TRT version 1
I0513 02:16:18.856252 92 logging.cc:52] Plugin creator already registered - ::NMS_TRT version 1
I0513 02:16:18.856274 92 logging.cc:52] Plugin creator already registered - ::NMSDynamic_TRT version 1
I0513 02:16:18.856286 92 logging.cc:52] Plugin creator already registered - ::Normalize_TRT version 1
I0513 02:16:18.856305 92 logging.cc:52] Plugin creator already registered - ::PillarScatterPlugin version 1
I0513 02:16:18.856317 92 logging.cc:52] Plugin creator already registered - ::PriorBox_TRT version 1
I0513 02:16:18.856336 92 logging.cc:52] Plugin creator already registered - ::ProposalLayer_TRT version 1
I0513 02:16:18.856355 92 logging.cc:52] Plugin creator already registered - ::Proposal version 1
I0513 02:16:18.856368 92 logging.cc:52] Plugin creator already registered - ::ProposalDynamic version 1
I0513 02:16:18.856382 92 logging.cc:52] Plugin creator already registered - ::PyramidROIAlign_TRT version 1
I0513 02:16:18.856394 92 logging.cc:52] Plugin creator already registered - ::Region_TRT version 1
I0513 02:16:18.856416 92 logging.cc:52] Plugin creator already registered - ::Reorg_TRT version 1
I0513 02:16:18.856434 92 logging.cc:52] Plugin creator already registered - ::ResizeNearest_TRT version 1
I0513 02:16:18.856451 92 logging.cc:52] Plugin creator already registered - ::RPROI_TRT version 1
I0513 02:16:18.856463 92 logging.cc:52] Plugin creator already registered - ::ScatterND version 1
I0513 02:16:18.856481 92 logging.cc:52] Plugin creator already registered - ::SpecialSlice_TRT version 1
I0513 02:16:18.856495 92 logging.cc:52] Plugin creator already registered - ::Split version 1
I0513 02:16:18.856517 92 logging.cc:52] Plugin creator already registered - ::VoxelGeneratorPlugin version 1
I0513 02:16:18.856537 92 tensorrt.cc:5353] backend configuration:
{}
I0513 02:16:18.856584 92 tensorrt.cc:5405] TRITONBACKEND_ModelInitialize: yolox (version 1)
I0513 02:16:18.857796 92 model_config_utils.cc:1592] ModelConfig 64-bit fields:
I0513 02:16:18.857824 92 model_config_utils.cc:1594]    ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds
I0513 02:16:18.857829 92 model_config_utils.cc:1594]    ModelConfig::dynamic_batching::max_queue_delay_microseconds
I0513 02:16:18.857847 92 model_config_utils.cc:1594]    ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds
I0513 02:16:18.857854 92 model_config_utils.cc:1594]    ModelConfig::ensemble_scheduling::step::model_version
I0513 02:16:18.857867 92 model_config_utils.cc:1594]    ModelConfig::input::dims
I0513 02:16:18.857871 92 model_config_utils.cc:1594]    ModelConfig::input::reshape::shape
I0513 02:16:18.857885 92 model_config_utils.cc:1594]    ModelConfig::instance_group::secondary_devices::device_id
I0513 02:16:18.857892 92 model_config_utils.cc:1594]    ModelConfig::model_warmup::inputs::value::dims
I0513 02:16:18.857899 92 model_config_utils.cc:1594]    ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim
I0513 02:16:18.857913 92 model_config_utils.cc:1594]    ModelConfig::optimization::cuda::graph_spec::input::value::dim
I0513 02:16:18.857929 92 model_config_utils.cc:1594]    ModelConfig::output::dims
I0513 02:16:18.857940 92 model_config_utils.cc:1594]    ModelConfig::output::reshape::shape
I0513 02:16:18.857954 92 model_config_utils.cc:1594]    ModelConfig::sequence_batching::direct::max_queue_delay_microseconds
I0513 02:16:18.857965 92 model_config_utils.cc:1594]    ModelConfig::sequence_batching::max_sequence_idle_microseconds
I0513 02:16:18.857975 92 model_config_utils.cc:1594]    ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds
I0513 02:16:18.857988 92 model_config_utils.cc:1594]    ModelConfig::sequence_batching::state::dims
I0513 02:16:18.857998 92 model_config_utils.cc:1594]    ModelConfig::sequence_batching::state::initial_state::dims
I0513 02:16:18.858009 92 model_config_utils.cc:1594]    ModelConfig::version_policy::specific::versions
I0513 02:16:18.858221 92 tensorrt.cc:439] model configuration:
{
    "name": "yolox",
    "platform": "tensorrt_plan",
    "backend": "tensorrt",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 8,
    "input": [
        {
            "name": "input",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                3,
                800,
                1344
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": false
        }
    ],
    "output": [
        {
            "name": "dets",
            "data_type": "TYPE_FP32",
            "dims": [
                100,
                5
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "labels",
            "data_type": "TYPE_INT32",
            "dims": [
                100
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "dynamic_batching": {
        "preferred_batch_size": [
            8
        ],
        "max_queue_delay_microseconds": 0,
        "preserve_ordering": false,
        "priority_levels": 0,
        "default_priority_level": 0,
        "priority_queue_policy": {}
    },
    "instance_group": [
        {
            "name": "yolox_0",
            "kind": "KIND_GPU",
            "count": 1,
            "gpus": [
                0
            ],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "end2end.engine",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {},
    "model_warmup": [
        {
            "name": "warmup",
            "batch_size": 8,
            "inputs": {
                "input": {
                    "data_type": "TYPE_FP32",
                    "dims": [
                        3,
                        800,
                        1344
                    ],
                    "zero_data": false
                }
            }
        }
    ]
}
I0513 02:16:18.858647 92 tensorrt.cc:5454] TRITONBACKEND_ModelInstanceInitialize: yolox_0 (GPU device 0)
I0513 02:16:18.859236 92 backend_model_instance.cc:105] Creating instance yolox_0 on GPU 0 (7.0) using artifact 'end2end.engine'
I0513 02:16:18.859715 92 tensorrt.cc:1485] Zero copy optimization is disabled
I0513 02:16:19.248961 92 logging.cc:49] [MemUsageChange] Init CUDA: CPU +252, GPU +0, now: CPU 1411, GPU 1013 (MiB)
I0513 02:16:19.287835 92 logging.cc:49] Loaded engine size: 21 MiB
I0513 02:16:19.932722 92 logging.cc:52] Using cublasLt as a tactic source
I0513 02:16:19.932884 92 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +385, GPU +178, now: CPU 1854, GPU 1215 (MiB)
I0513 02:16:19.933075 92 logging.cc:52] Using cuDNN as a tactic source
I0513 02:16:20.129675 92 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +116, GPU +54, now: CPU 1970, GPU 1269 (MiB)
I0513 02:16:20.131117 92 logging.cc:52] Deserialization required 842793 microseconds.
I0513 02:16:20.131156 92 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +18, now: CPU 0, GPU 18 (MiB)
I0513 02:16:20.131172 92 tensorrt.cc:387] Created new runtime on GPU device 0, NVDLA core -1 for yolox
I0513 02:16:20.131189 92 tensorrt.cc:394] Created new engine on GPU device 0, NVDLA core -1 for yolox
I0513 02:16:20.132439 92 logging.cc:52] Using cublasLt as a tactic source
I0513 02:16:20.132538 92 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 1927, GPU 1261 (MiB)
I0513 02:16:20.132699 92 logging.cc:52] Using cuDNN as a tactic source
I0513 02:16:20.134021 92 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1927, GPU 1269 (MiB)
I0513 02:16:20.134441 92 logging.cc:52] Total per-runner device persistent memory is 19213824
I0513 02:16:20.134467 92 logging.cc:52] Total per-runner host persistent memory is 198240
I0513 02:16:20.135116 92 logging.cc:52] Allocated activation device memory of size 69738496
I0513 02:16:20.143194 92 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +85, now: CPU 0, GPU 103 (MiB)
I0513 02:16:20.143240 92 tensorrt.cc:3175] Detected input as execution binding for yolox_0
I0513 02:16:20.143256 92 tensorrt.cc:3175] Detected dets as execution binding for yolox_0
I0513 02:16:20.143264 92 tensorrt.cc:3175] Detected labels as execution binding for yolox_0
I0513 02:16:20.143725 92 tensorrt.cc:1411] Created instance yolox_0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0513 02:16:20.143748 92 backend_model_instance.cc:346] Generating warmup sample data for 'warmup'
I0513 02:16:20.143779 92 pinned_memory_manager.cc:161] pinned memory allocation: size 12902400, addr 0x7fdfa0000090
I0513 02:16:20.146257 92 infer_request.cc:707] prepared: [0x0x7fdf348d2450] request id: , model: yolox, requested version: 1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7fdf3f11fca8] input: input, type: FP32, original shape: [1,3,800,1344], batch + shape: [1,3,800,1344], shape: [3,800,1344]
override inputs:
inputs:
[0x0x7fdf3f11fca8] input: input, type: FP32, original shape: [1,3,800,1344], batch + shape: [1,3,800,1344], shape: [3,800,1344]
original requested outputs:
requested outputs:
dets
labels

I0513 02:16:20.146297 92 infer_request.cc:707] prepared: [0x0x7fdf348d2840] request id: , model: yolox, requested version: 1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7fdf3e4f8cf8] input: input, type: FP32, original shape: [1,3,800,1344], batch + shape: [1,3,800,1344], shape: [3,800,1344]
override inputs:
inputs:
[0x0x7fdf3e4f8cf8] input: input, type: FP32, original shape: [1,3,800,1344], batch + shape: [1,3,800,1344], shape: [3,800,1344]
original requested outputs:
requested outputs:
dets
labels

I0513 02:16:20.146322 92 infer_request.cc:707] prepared: [0x0x7fdf348d2c50] request id: , model: yolox, requested version: 1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7fdf3e5927d8] input: input, type: FP32, original shape: [1,3,800,1344], batch + shape: [1,3,800,1344], shape: [3,800,1344]
override inputs:
inputs:
[0x0x7fdf3e5927d8] input: input, type: FP32, original shape: [1,3,800,1344], batch + shape: [1,3,800,1344], shape: [3,800,1344]
original requested outputs:
requested outputs:
dets
labels

I0513 02:16:20.146359 92 infer_request.cc:707] prepared: [0x0x7fdf348d3070] request id: , model: yolox, requested version: 1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7fdf3e6abd78] input: input, type: FP32, original shape: [1,3,800,1344], batch + shape: [1,3,800,1344], shape: [3,800,1344]
override inputs:
inputs:
[0x0x7fdf3e6abd78] input: input, type: FP32, original shape: [1,3,800,1344], batch + shape: [1,3,800,1344], shape: [3,800,1344]
original requested outputs:
requested outputs:
dets
labels

I0513 02:16:20.146397 92 infer_request.cc:707] prepared: [0x0x7fdf3498b1c0] request id: , model: yolox, requested version: 1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7fdf3e4ec7c8] input: input, type: FP32, original shape: [1,3,800,1344], batch + shape: [1,3,800,1344], shape: [3,800,1344]
override inputs:
inputs:
[0x0x7fdf3e4ec7c8] input: input, type: FP32, original shape: [1,3,800,1344], batch + shape: [1,3,800,1344], shape: [3,800,1344]
original requested outputs:
requested outputs:
dets
labels

I0513 02:16:20.146438 92 infer_request.cc:707] prepared: [0x0x7fdf3498b820] request id: , model: yolox, requested version: 1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7fdf3ceb1b98] input: input, type: FP32, original shape: [1,3,800,1344], batch + shape: [1,3,800,1344], shape: [3,800,1344]
override inputs:
inputs:
[0x0x7fdf3ceb1b98] input: input, type: FP32, original shape: [1,3,800,1344], batch + shape: [1,3,800,1344], shape: [3,800,1344]
original requested outputs:
requested outputs:
dets
labels

I0513 02:16:20.146474 92 infer_request.cc:707] prepared: [0x0x7fdf3498be60] request id: , model: yolox, requested version: 1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7fdf3ceaa1e8] input: input, type: FP32, original shape: [1,3,800,1344], batch + shape: [1,3,800,1344], shape: [3,800,1344]
override inputs:
inputs:
[0x0x7fdf3ceaa1e8] input: input, type: FP32, original shape: [1,3,800,1344], batch + shape: [1,3,800,1344], shape: [3,800,1344]
original requested outputs:
requested outputs:
dets
labels

I0513 02:16:20.146518 92 infer_request.cc:707] prepared: [0x0x7fdf3498c4a0] request id: , model: yolox, requested version: 1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7fdf3498c808] input: input, type: FP32, original shape: [1,3,800,1344], batch + shape: [1,3,800,1344], shape: [3,800,1344]
override inputs:
inputs:
[0x0x7fdf3498c808] input: input, type: FP32, original shape: [1,3,800,1344], batch + shape: [1,3,800,1344], shape: [3,800,1344]
original requested outputs:
requested outputs:
dets
labels

I0513 02:16:20.146757 92 backend_model_instance.cc:683] Starting backend thread for yolox_0 at nice 0 on device 0...
I0513 02:16:20.146863 92 backend_model_instance.cc:547] model 'yolox' instance yolox_0 is running warmup sample 'warmup'
I0513 02:16:20.146931 92 tensorrt.cc:5525] model yolox, instance yolox_0, executing 8 requests
I0513 02:16:20.146961 92 tensorrt.cc:1599] TRITONBACKEND_ModelExecute: Issuing yolox_0 with 8 requests
I0513 02:16:20.146987 92 tensorrt.cc:1658] TRITONBACKEND_ModelExecute: Running yolox_0 with 8 requests
I0513 02:16:20.147024 92 tensorrt.cc:2785] Optimization profile default [0] is selected for yolox_0
E0513 02:16:20.147086 92 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 02:16:20.147116 92 tensorrt.cc:1993] error setting the binding dimension
E0513 02:16:20.147146 92 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 02:16:20.147159 92 tensorrt.cc:1993] error setting the binding dimension
E0513 02:16:20.147177 92 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 02:16:20.147189 92 tensorrt.cc:1993] error setting the binding dimension
E0513 02:16:20.147213 92 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 02:16:20.147226 92 tensorrt.cc:1993] error setting the binding dimension
E0513 02:16:20.147243 92 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 02:16:20.147269 92 tensorrt.cc:1993] error setting the binding dimension
E0513 02:16:20.147293 92 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 02:16:20.147304 92 tensorrt.cc:1993] error setting the binding dimension
E0513 02:16:20.147325 92 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 02:16:20.147343 92 tensorrt.cc:1993] error setting the binding dimension
E0513 02:16:20.147356 92 backend_model_instance.cc:99] warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8
E0513 02:16:20.147368 92 tensorrt.cc:1993] error setting the binding dimension
I0513 02:16:20.147594 92 model_repository_manager.cc:1231] successfully loaded 'yolox' version 1
I0513 02:16:20.147646 92 dynamic_batch_scheduler.cc:280] Starting dynamic-batcher thread for yolox at nice 0...
I0513 02:16:20.147677 92 server.cc:549] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0513 02:16:20.147821 92 server.cc:576] 
+-------------+------------------------------------------------------+--------+
| Backend     | Path                                                 | Config |
+-------------+------------------------------------------------------+--------+
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch | {}     |
|             | .so                                                  |        |
| tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_ten | {}     |
|             | sorflow1.so                                          |        |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onn | {}     |
|             | xruntime.so                                          |        |
| openvino    | /opt/tritonserver/backends/openvino_2021_4/libtriton | {}     |
|             | _openvino_2021_4.so                                  |        |
| tensorrt    | /opt/tritonserver/backends/tensorrt/libtriton_tensor | {}     |
|             | rt.so                                                |        |
+-------------+------------------------------------------------------+--------+

I0513 02:16:20.147886 92 server.cc:619] 
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
| yolox | 1       | READY  |
+-------+---------+--------+

I0513 02:16:20.148027 92 tritonserver.cc:2123] 
+----------------------------------+------------------------------------------+
| Option                           | Value                                    |
+----------------------------------+------------------------------------------+
| server_id                        | triton                                   |
| server_version                   | 2.21.0                                   |
| server_extensions                | classification sequence model_repository |
|                                  |  model_repository(unload_dependents) sch |
|                                  | edule_policy model_configuration system_ |
|                                  | shared_memory cuda_shared_memory binary_ |
|                                  | tensor_data statistics trace             |
| model_repository_path[0]         | /opt/ml/model/                           |
| model_control_mode               | MODE_EXPLICIT                            |
| startup_models_0                 | yolox                                    |
| strict_model_config              | 1                                        |
| rate_limit                       | OFF                                      |
| pinned_memory_pool_byte_size     | 268435456                                |
| cuda_memory_pool_byte_size{0}    | 67108864                                 |
| response_cache_byte_size         | 0                                        |
| min_supported_compute_capability | 6.0                                      |
| strict_readiness                 | 1                                        |
| exit_timeout                     | 30                                       |
+----------------------------------+------------------------------------------+

I0513 02:16:20.148465 92 sagemaker_server.cc:136] Started Sagemaker HTTPService at 0.0.0.0:8080

nv-kmcgill53 commented 2 years ago

Thank you for providing the verbose output.

warmup error: Internal - request specifies invalid shape for input 'input' for yolox_0. Error details: model expected the shape of dimension 0 to be between 1 and 1 but received 8

From the errors you are receiving, it appears that your model doesn't support batching and the TensorRT backend is complaining about this. To confirm this, you can try to load the model outside of triton with native TensorRT and expect the same results when sending in a batch size greater than 1. In this case, if you want batching support you may need to regenerate the model.

@tanmayv25 Can you confirm this?

tanmayv25 commented 2 years ago

@austinmw In order for a model to support dynamic batching, the batch dimension(index 0) should be dynamic. Not only that but your optimization profiles in the plan file should cover the whole range from [1, max_batch_size] for batch size.

As @nv-kmcgill53 pointed out your TRT engine appears to not have first dimension dynamic, but it is fixed at 1. You can use polygraphy inspect on your model to study the supported shapes.

This is how a network supporting the dynamic batching should look like:

polygraphy:
==== TensorRT Network ====
Name: Unnamed Network 0 | Explicit Batch Network
---- 3 Network Input(s) ----
{input_id [dtype=int32, shape=(-1, 128)],
attention_mask [dtype=int32, shape=(-1, 128)],
token_type_ids [dtype=int32, shape=(-1, 128)]}
---- 1 Network Output(s) ----
{output [dtype=float32, shape=(-1, 128, 384)]}

Source: https://github.com/triton-inference-server/server/issues/3928

austinmw commented 2 years ago

Ah okay thanks for your response! I used this config file from the MMDeploy framework for an MMDetection YOLOX model. I thought the "static" and "dynamic" terminology referred to the height and width dimensions, but I guess as you say it refers to the batch dimension as well.

For some reason when upgrading my TRT version from 21.04 to either 21.08 or 22.04, the dynamic batching config gave me an error (it seems to work with 21.04). Do you happen to be aware of any changes that would have broke this? Maybe this is a question better suited for MMDeploy, but I'd also really help any advice! Here's the MMDeploy issue I created: https://github.com/open-mmlab/mmdeploy/issues/460

The error with the dynamic config is:

[TensorRT] ERROR: 4: [shapeCompiler.cpp::evaluateShapeChecks::822] Error Code 4: Internal Error (kOPT values for profile 0 violate shape constraints: condition '==' violated. Concat_351: dimensions not compatible for concatenation) 2022-05-10:18:38:31,root ERROR [utils.py:43] Failed to create TensorRT engine

tanmayv25 commented 2 years ago

I am not familiar with mmdeploy. But the error message suggests that the shapes[kMin,kOPT, kMAX] were fed incorrectly to TensorRT. If not using dynamic shapes the constraint is: kMin == kOPT == kMAX If using dynamic shapes constraint is: kMin <= kOPT <= kMAX

Closing the issue for now as I believe your question has been answered w.r.t. Triton. Please create another issue if you have any other questions.

austinmw commented 2 years ago

@tanmayv25 Hi, sorry one more question, when using dynamic shapes and dynamic_batching should my input dim for image batches be [3, -1, -1] which gets translated to [-1, 3, -1, -1] when max_batch_size > 1?

I used a dynamic shapes config to generate a new engine file, but now I get the error:

failed to load 'yolox' version 1: Internal: trt failed to set binding dimension to [1,3,608,608] for input 'input' for yolox_0

tanmayv25 commented 2 years ago

The model is not supporting the shape [1,3,608,608]. Please use polygraphy inspect to check the binding dimensions supported in your model plan file.

You can also utilize auto-complete feature of Triton by not providing the model config.pbtxt at all and run tritonserver with --strict-model-config=false --log-verbose=1 to see what dimensions are supported in TRT engine..

austinmw commented 2 years ago

@tanmayv25 Thanks for your help. I tried running without a config file and added --strict-model-config=false. Here's the autogenerated config:

I0515 06:24:47.815970 90 tensorrt.cc:475] post auto-complete:
{
    "name": "yolox",
    "platform": "tensorrt_plan",
    "backend": "tensorrt",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 1,
    "input": [
        {
            "name": "input",
            "data_type": "TYPE_FP32",
            "dims": [
                3,
                -1,
                -1
            ],
            "is_shape_tensor": false
        }
    ],
    "output": [
        {
            "name": "dets",
            "data_type": "TYPE_FP32",
            "dims": [
                100,
                5
            ],
            "is_shape_tensor": false
        },
        {
            "name": "labels",
            "data_type": "TYPE_INT32",
            "dims": [
                100
            ],
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "instance_group": [
        {
            "name": "yolox_0",
            "kind": "KIND_GPU",
            "count": 1,
            "gpus": [
                0
            ],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "end2end.engine",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {},
    "model_warmup": []
}
I0515 06:24:47.816455 90 tensorrt.cc:439] model configuration:
{
    "name": "yolox",
    "platform": "tensorrt_plan",
    "backend": "tensorrt",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 1,
    "input": [
        {
            "name": "input",
            "data_type": "TYPE_FP32",
            "dims": [
                3,
                -1,
                -1
            ],
            "is_shape_tensor": false
        }
    ],
    "output": [
        {
            "name": "dets",
            "data_type": "TYPE_FP32",
            "dims": [
                100,
                5
            ],
            "is_shape_tensor": false
        },
        {
            "name": "labels",
            "data_type": "TYPE_INT32",
            "dims": [
                100
            ],
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "instance_group": [
        {
            "name": "yolox_0",
            "kind": "KIND_GPU",
            "count": 1,
            "gpus": [
                0
            ],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "end2end.engine",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {},
    "model_warmup": []
}

However I actually still get a similar error:

E0515 06:24:47.887078 90 model_repository_manager.cc:1234] failed to load 'yolox' version 1: Internal: trt failed to set binding dimension to [1,3,608,608] for input 'input' for yolox_0

I also tried to use polygraphy inspect, but got an IPluginCreator error, and prepending LD_PRELOAD=/root/workspace/mmdeploy/build/lib/libmmdeploy_mmdet.so didn't seem to work as it does with tritonserver.

tanmayv25 commented 2 years ago

You didn't provide the model config.pbtxt in the model repo for the server when running with --strict-model-config=false, right?

From the auto-generated config it looks like the model supports : [1, 3, -1, -1] However, it fails to set the dimension to [1, 3, 608, 608]. The most likely explanation to this would be that 608 goes beyond the range of <kMIN - kMAX> in dims[2] and dims[3].

austinmw commented 2 years ago

@tanmayv25 Hmmmm, yeah, when I provided no config.pbtxt at all it complained that platform was an empty string, so I just provided a very short config with no input or output definitions and "platform": "tensorrt_plan" while setting —strict-model-config=false.

When alternatively experimenting with true Instead, I attempted multiple input shapes including the max shape specified in my tensorrt config file as well as [3, -1, -1].

tanmayv25 commented 2 years ago

This means we would need to look at the shape somehow in the model.

I also tried to use polygraphy inspect, but got an IPluginCreator error, and prepending LD_PRELOAD=/root/workspace/mmdeploy/build/lib/libmmdeploy_mmdet.so didn't seem to work as it does with tritonserver.

You can try using --plugins, e.g: polygraphy inspect model my_model.engine --plugins /path/to/plugins.so

austinmw commented 2 years ago

@tanmayv25 Hi, thanks for the plugin help! Here's a summary of my inputs/outputs:

My model is MMDetection YOLOX-tiny
Here's my conversion config file:

_base_ = ['../_base_/base_dynamic.py', '../../_base_/backends/tensorrt.py']

backend_config = dict(
    common_config=dict(max_workspace_size=1 << 33),
    model_inputs=[
        dict(
            input_shapes=dict(
                input=dict(
                    min_shape=[1, 3, 160, 160],
                    opt_shape=[32, 3, 416, 416],
                    max_shape=[32, 3, 640, 640])))
    ])

Here's my resulting polygraphy output:

[I] Loading plugin library: /root/workspace/mmdeploy/build/lib/libmmdeploy_tensorrt_ops.so
[I] Loading bytes from /volume_share/end2end.engine
[I] ==== TensorRT Engine ====
    Name: Unnamed Network 0 | Explicit Batch Engine

    ---- 1 Engine Input(s) ----
    {input [dtype=float32, shape=(-1, 3, -1, -1)]}

    ---- 2 Engine Output(s) ----
    {dets [dtype=float32, shape=(-1, 100, 5)],
     labels [dtype=int32, shape=(-1, 100)]}

    ---- Memory ----
    Device Memory: 3294651904 bytes

    ---- 1 Profile(s) (3 Binding(s) Each) ----
    - Profile: 0
        Binding Index: 0 (Input)  [Name: input]  | Shapes: min=(1, 3, 160, 160), opt=(32, 3, 416, 416), max=(32, 3, 640, 640)
        Binding Index: 1 (Output) [Name: dets]   | Shape: (-1, 100, 5)
        Binding Index: 2 (Output) [Name: labels] | Shape: (-1, 100)

    ---- 267 Layer(s) ----

And here's my Triton config file:

name: "yolox"
platform: "tensorrt_plan"

input [{
    name: "input"
    data_type: TYPE_FP32
    format: FORMAT_NONE
    dims: [3, -1, -1]
}]   

output [
  {
    name: "dets"
    data_type: TYPE_FP32
    dims: [100, 5]
  },    
  {
    name: "labels"
    data_type: TYPE_INT32
    dims: [100]
  }    
]

max_batch_size: 32
instance_group {
  count: 1
  kind: KIND_GPU
}
dynamic_batching {
}   

default_model_filename: "end2end.engine"

Finally here's my Triton run command:

LD_PRELOAD=libmmdeploy_tensorrt_ops.so tritonserver --allow-sagemaker=true --allow-grpc=false --allow-http=false --allow-metrics=false --model-control-mode=explicit $SAGEMAKER_ARGS

Yet I still end up with following error:

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 22.04 (build 36821869)
Triton Server Version 2.21.0

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 11.6 driver version 510.47.03 with kernel driver version 450.142.00.
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

I0526 23:11:56.811105 90 libtorch.cc:1381] TRITONBACKEND_Initialize: pytorch
I0526 23:11:56.811210 90 libtorch.cc:1391] Triton TRITONBACKEND API version: 1.9
I0526 23:11:56.811244 90 libtorch.cc:1397] 'pytorch' TRITONBACKEND API version: 1.9
2022-05-26 23:11:57.038822: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0526 23:11:57.085032 90 tensorflow.cc:2181] TRITONBACKEND_Initialize: tensorflow
I0526 23:11:57.085081 90 tensorflow.cc:2191] Triton TRITONBACKEND API version: 1.9
I0526 23:11:57.085103 90 tensorflow.cc:2197] 'tensorflow' TRITONBACKEND API version: 1.9
I0526 23:11:57.085123 90 tensorflow.cc:2221] backend configuration:
{}
I0526 23:11:57.086930 90 onnxruntime.cc:2400] TRITONBACKEND_Initialize: onnxruntime
I0526 23:11:57.086958 90 onnxruntime.cc:2410] Triton TRITONBACKEND API version: 1.9
I0526 23:11:57.086977 90 onnxruntime.cc:2416] 'onnxruntime' TRITONBACKEND API version: 1.9
I0526 23:11:57.086995 90 onnxruntime.cc:2446] backend configuration:
{}
I0526 23:11:57.108611 90 openvino.cc:1207] TRITONBACKEND_Initialize: openvino
I0526 23:11:57.108641 90 openvino.cc:1217] Triton TRITONBACKEND API version: 1.9
I0526 23:11:57.108648 90 openvino.cc:1223] 'openvino' TRITONBACKEND API version: 1.9
I0526 23:11:59.002514 90 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f62ee000000' with size 268435456
I0526 23:11:59.003010 90 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0526 23:11:59.005139 90 model_repository_manager.cc:1077] loading: yolox:1
I0526 23:11:59.106200 90 tensorrt.cc:5294] TRITONBACKEND_Initialize: tensorrt
I0526 23:11:59.106251 90 tensorrt.cc:5304] Triton TRITONBACKEND API version: 1.9
I0526 23:11:59.106279 90 tensorrt.cc:5310] 'tensorrt' TRITONBACKEND API version: 1.9
I0526 23:11:59.106416 90 tensorrt.cc:5353] backend configuration:
{}
I0526 23:11:59.106462 90 tensorrt.cc:5405] TRITONBACKEND_ModelInitialize: yolox (version 1)
I0526 23:11:59.108117 90 tensorrt.cc:5454] TRITONBACKEND_ModelInstanceInitialize: yolox_0 (GPU device 0)
I0526 23:11:59.522283 90 logging.cc:49] [MemUsageChange] Init CUDA: CPU +252, GPU +0, now: CPU 1411, GPU 1013 (MiB)
I0526 23:11:59.597039 90 logging.cc:49] Loaded engine size: 38 MiB
I0526 23:12:00.272506 90 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +385, GPU +180, now: CPU 1885, GPU 1231 (MiB)
I0526 23:12:00.465612 90 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +116, GPU +52, now: CPU 2001, GPU 1283 (MiB)
I0526 23:12:00.467141 90 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +35, now: CPU 0, GPU 35 (MiB)
I0526 23:12:00.473948 90 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 1924, GPU 1275 (MiB)
I0526 23:12:00.475812 90 logging.cc:49] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 1925, GPU 1283 (MiB)
E0526 23:12:00.534641 90 logging.cc:43] 1: [raiiMyelinGraph.h::RAIIMyelinGraph::24] Error Code 1: Myelin (Compiled against cuDNN 10.2.2.0 but running against cuDNN 11.9.3.0.)
I0526 23:12:00.534702 90 logging.cc:49] Could not set default profile 0 for execution context. Profile index must be set explicitly.
I0526 23:12:00.534742 90 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +3178, now: CPU 0, GPU 3213 (MiB)
E0526 23:12:00.534786 90 logging.cc:43] 3: [executionContext.cpp::setBindingDimensions::926] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::926, condition: mOptimizationProfile >= 0 && mOptimizationProfile < mEngine.getNbOptimizationProfiles()
)
I0526 23:12:00.534818 90 tensorrt.cc:5492] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0526 23:12:00.534851 90 tensorrt.cc:5431] TRITONBACKEND_ModelFinalize: delete model state
E0526 23:12:00.535852 90 model_repository_manager.cc:1234] failed to load 'yolox' version 1: Internal: trt failed to set binding dimension to [32,3,640,640] for input 'input' for yolox_0
I0526 23:12:00.536131 90 tritonserver.cc:2123] 
+----------------------------------+------------------------------------------+
| Option                           | Value                                    |
+----------------------------------+------------------------------------------+
| server_id                        | triton                                   |
| server_version                   | 2.21.0                                   |
| server_extensions                | classification sequence model_repository |
|                                  |  model_repository(unload_dependents) sch |
|                                  | edule_policy model_configuration system_ |
|                                  | shared_memory cuda_shared_memory binary_ |
|                                  | tensor_data statistics trace             |
| model_repository_path[0]         | /opt/ml/model/                           |
| model_control_mode               | MODE_EXPLICIT                            |
| startup_models_0                 | yolox                                    |
| strict_model_config              | 1                                        |
| rate_limit                       | OFF                                      |
| pinned_memory_pool_byte_size     | 268435456                                |
| cuda_memory_pool_byte_size{0}    | 67108864                                 |
| response_cache_byte_size         | 0                                        |
| min_supported_compute_capability | 6.0                                      |
| strict_readiness                 | 1                                        |
| exit_timeout                     | 30                                       |
+----------------------------------+------------------------------------------+

I0526 23:12:00.536194 90 server.cc:247] No server context available. Exiting immediately.
error: creating server: Invalid argument - load failed for model 'yolox': version 1: Internal: trt failed to set binding dimension to [32,3,640,640] for input 'input' for yolox_0;

austinmw commented 2 years ago

It turned out to be a Myelin graph cudnn compatibility error that I was not able to see until testing with trtexec inference.

tanmayv25 commented 2 years ago

@austinmw I am little curious. Can you describe the issue? Is there a known issue that you are hitting with Myelin graph?

austinmw commented 2 years ago

I think the MMDeploy docker container I build had some version mismatch, but didn't fail to build. I don't think I have the previous dockerfile anymore to check the exact error, but when running trtexec inference I was able to see the error in detail which I couldn't see with Triton logging.

triton-inference-server / server

[Question] Setting dynamic batching with warmup #4373