triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.16k stars 1.46k forks source link

Server stuck Inference phase from Client's request ! #6738

Closed rungrodkspeed closed 9 months ago

rungrodkspeed commented 9 months ago

Description Server stuck at Inference phase after send request from Client (Python API).

- models
    - resnet50
        - 1
           - resnet50.plan
        - config.pbtxt

] output [ { name: "output" data_type: TYPE_FP32 dims: [ 8, 1000 ] } ]

default_model_filename: "resnet50.plan"

**I've tried dynamic batch either static batch. it still have the same problem.

- this my model.plan
https://drive.google.com/file/d/1I14RQrQ-lEH5R4pYaEAdKxF_aMM0mMP0/view?usp=sharing

**Triton Information**
I used to container version (23.06-py3, 23.07-py3, etc.)
but every version I tried. it's the same problem.

**To Reproduce**

1.) I run container via 
`docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/mnt/d/github/cnn_classification_flower/deploy_triton/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models --log-verbose=1 --exit-on-error=true`

after run command this is Server's logs.

============================= == Triton Inference Server ==

NVIDIA Release 23.06 (build 62878575) Triton Server Version 2.35.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

I1225 18:20:54.779188 1 shared_library.cc:112] OpenLibraryHandle: /opt/tritonserver/backends/pytorch/libtriton_pytorch.so I1225 18:21:04.549382 1 libtorch.cc:2253] TRITONBACKEND_Initialize: pytorch I1225 18:21:04.549405 1 libtorch.cc:2263] Triton TRITONBACKEND API version: 1.13 I1225 18:21:04.549411 1 libtorch.cc:2269] 'pytorch' TRITONBACKEND API version: 1.13 I1225 18:21:04.549423 1 cache_manager.cc:478] Create CacheManager with cache_dir: '/opt/tritonserver/caches' I1225 18:21:07.897197 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x205000000' with size 268435456 I1225 18:21:07.899146 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864 I1225 18:21:07.933394 1 model_config_utils.cc:647] Server side auto-completed config: name: "resnet50" platform: "tensorrt_plan" input { name: "input" data_type: TYPE_FP32 dims: 8 dims: 3 dims: 224 dims: 224 } output { name: "output" data_type: TYPE_FP32 dims: 8 dims: 1000 } default_model_filename: "resnet50.plan" backend: "tensorrt"

I1225 18:21:07.938440 1 model_lifecycle.cc:462] loading: resnet50:1 I1225 18:21:07.940320 1 backend_model.cc:362] Adding default backend config setting: default-max-batch-size,4 I1225 18:21:07.940373 1 shared_library.cc:112] OpenLibraryHandle: /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so I1225 18:21:07.942606 1 tensorrt.cc:65] TRITONBACKEND_Initialize: tensorrt I1225 18:21:07.942636 1 tensorrt.cc:75] Triton TRITONBACKEND API version: 1.13 I1225 18:21:07.942640 1 tensorrt.cc:81] 'tensorrt' TRITONBACKEND API version: 1.13 I1225 18:21:07.942643 1 tensorrt.cc:105] backend configuration: {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} I1225 18:21:07.942672 1 tensorrt.cc:178] Registering TensorRT Plugins I1225 18:21:07.943169 1 logging.cc:49] Plugin creator already registered - ::BatchedNMSDynamic_TRT version 1 I1225 18:21:07.943193 1 logging.cc:49] Plugin creator already registered - ::BatchedNMS_TRT version 1 I1225 18:21:07.943198 1 logging.cc:49] Plugin creator already registered - ::BatchTilePlugin_TRT version 1 I1225 18:21:07.943202 1 logging.cc:49] Plugin creator already registered - ::Clip_TRT version 1 I1225 18:21:07.943206 1 logging.cc:49] Plugin creator already registered - ::CoordConvAC version 1 I1225 18:21:07.943212 1 logging.cc:49] Plugin creator already registered - ::CropAndResizeDynamic version 1 I1225 18:21:07.943216 1 logging.cc:49] Plugin creator already registered - ::CropAndResize version 1 I1225 18:21:07.943220 1 logging.cc:49] Plugin creator already registered - ::DecodeBbox3DPlugin version 1 I1225 18:21:07.943225 1 logging.cc:49] Plugin creator already registered - ::DetectionLayer_TRT version 1 I1225 18:21:07.943229 1 logging.cc:49] Plugin creator already registered - ::EfficientNMS_Explicit_TF_TRT version 1 I1225 18:21:07.943234 1 logging.cc:49] Plugin creator already registered - ::EfficientNMS_Implicit_TF_TRT version 1 I1225 18:21:07.943237 1 logging.cc:49] Plugin creator already registered - ::EfficientNMS_ONNX_TRT version 1 I1225 18:21:07.943253 1 logging.cc:49] Plugin creator already registered - ::EfficientNMS_TRT version 1 I1225 18:21:07.943258 1 logging.cc:49] Plugin creator already registered - ::FlattenConcat_TRT version 1 I1225 18:21:07.943273 1 logging.cc:49] Plugin creator already registered - ::GenerateDetection_TRT version 1 I1225 18:21:07.943278 1 logging.cc:49] Plugin creator already registered - ::GridAnchor_TRT version 1 I1225 18:21:07.943291 1 logging.cc:49] Plugin creator already registered - ::GridAnchorRect_TRT version 1 I1225 18:21:07.943296 1 logging.cc:49] Plugin creator already registered - ::InstanceNormalization_TRT version 1 I1225 18:21:07.943298 1 logging.cc:49] Plugin creator already registered - ::InstanceNormalization_TRT version 2 I1225 18:21:07.943303 1 logging.cc:49] Plugin creator already registered - ::LReLU_TRT version 1 I1225 18:21:07.943307 1 logging.cc:49] Plugin creator already registered - ::ModulatedDeformConv2d version 1 I1225 18:21:07.943311 1 logging.cc:49] Plugin creator already registered - ::MultilevelCropAndResize_TRT version 1 I1225 18:21:07.943314 1 logging.cc:49] Plugin creator already registered - ::MultilevelProposeROI_TRT version 1 I1225 18:21:07.943318 1 logging.cc:49] Plugin creator already registered - ::MultiscaleDeformableAttnPlugin_TRT version 1 I1225 18:21:07.943322 1 logging.cc:49] Plugin creator already registered - ::NMSDynamic_TRT version 1 I1225 18:21:07.943328 1 logging.cc:49] Plugin creator already registered - ::NMS_TRT version 1 I1225 18:21:07.943334 1 logging.cc:49] Plugin creator already registered - ::Normalize_TRT version 1 I1225 18:21:07.943356 1 logging.cc:49] Plugin creator already registered - ::PillarScatterPlugin version 1 I1225 18:21:07.943363 1 logging.cc:49] Plugin creator already registered - ::PriorBox_TRT version 1 I1225 18:21:07.943370 1 logging.cc:49] Plugin creator already registered - ::ProposalDynamic version 1 I1225 18:21:07.943375 1 logging.cc:49] Plugin creator already registered - ::ProposalLayer_TRT version 1 I1225 18:21:07.943379 1 logging.cc:49] Plugin creator already registered - ::Proposal version 1 I1225 18:21:07.943384 1 logging.cc:49] Plugin creator already registered - ::PyramidROIAlign_TRT version 1 I1225 18:21:07.943392 1 logging.cc:49] Plugin creator already registered - ::Region_TRT version 1 I1225 18:21:07.943398 1 logging.cc:49] Plugin creator already registered - ::Reorg_TRT version 1 I1225 18:21:07.943405 1 logging.cc:49] Plugin creator already registered - ::ResizeNearest_TRT version 1 I1225 18:21:07.943411 1 logging.cc:49] Plugin creator already registered - ::ROIAlign_TRT version 1 I1225 18:21:07.943416 1 logging.cc:49] Plugin creator already registered - ::RPROI_TRT version 1 I1225 18:21:07.943421 1 logging.cc:49] Plugin creator already registered - ::ScatterND version 1 I1225 18:21:07.943425 1 logging.cc:49] Plugin creator already registered - ::SpecialSlice_TRT version 1 I1225 18:21:07.943430 1 logging.cc:49] Plugin creator already registered - ::Split version 1 I1225 18:21:07.943435 1 logging.cc:49] Plugin creator already registered - ::VoxelGeneratorPlugin version 1 I1225 18:21:07.981194 1 tensorrt.cc:222] TRITONBACKEND_ModelInitialize: resnet50 (version 1) I1225 18:21:07.983613 1 model_config_utils.cc:1839] ModelConfig 64-bit fields: I1225 18:21:07.983653 1 model_config_utils.cc:1841] ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds I1225 18:21:07.983658 1 model_config_utils.cc:1841] ModelConfig::dynamic_batching::max_queue_delay_microseconds I1225 18:21:07.983661 1 model_config_utils.cc:1841] ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds I1225 18:21:07.983664 1 model_config_utils.cc:1841] ModelConfig::ensemble_scheduling::step::model_version I1225 18:21:07.983666 1 model_config_utils.cc:1841] ModelConfig::input::dims I1225 18:21:07.983668 1 model_config_utils.cc:1841] ModelConfig::input::reshape::shape I1225 18:21:07.983671 1 model_config_utils.cc:1841] ModelConfig::instance_group::secondary_devices::device_id I1225 18:21:07.983673 1 model_config_utils.cc:1841] ModelConfig::model_warmup::inputs::value::dims I1225 18:21:07.983675 1 model_config_utils.cc:1841] ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim I1225 18:21:07.983677 1 model_config_utils.cc:1841] ModelConfig::optimization::cuda::graph_spec::input::value::dim I1225 18:21:07.983680 1 model_config_utils.cc:1841] ModelConfig::output::dims I1225 18:21:07.983682 1 model_config_utils.cc:1841] ModelConfig::output::reshape::shape I1225 18:21:07.983684 1 model_config_utils.cc:1841] ModelConfig::sequence_batching::direct::max_queue_delay_microseconds I1225 18:21:07.983686 1 model_config_utils.cc:1841] ModelConfig::sequence_batching::max_sequence_idle_microseconds I1225 18:21:07.983689 1 model_config_utils.cc:1841] ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds I1225 18:21:07.983691 1 model_config_utils.cc:1841] ModelConfig::sequence_batching::state::dims I1225 18:21:07.983693 1 model_config_utils.cc:1841] ModelConfig::sequence_batching::state::initial_state::dims I1225 18:21:07.983696 1 model_config_utils.cc:1841] ModelConfig::version_policy::specific::versions I1225 18:21:07.984427 1 model_state.cc:308] Setting the CUDA device to GPU0 to auto-complete config for resnet50 I1225 18:21:07.984483 1 model_state.cc:354] Using explicit serialized file 'resnet50.plan' to auto-complete config for resnet50 I1225 18:21:12.914818 1 logging.cc:46] Loaded engine size: 50 MiB I1225 18:21:14.583314 1 logging.cc:49] Deserialization required 127678 microseconds. I1225 18:21:14.583355 1 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +48, now: CPU 0, GPU 48 (MiB) W1225 18:21:14.587585 1 model_state.cc:522] The specified dimensions in model config for resnet50 hints that batching is unavailable I1225 18:21:14.590742 1 model_state.cc:379] post auto-complete: { "name": "resnet50", "platform": "tensorrt_plan", "backend": "tensorrt", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 0, "input": [ { "name": "input", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ 8, 3, 224, 224 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "output", "data_type": "TYPE_FP32", "dims": [ 8, 1000 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "priority": "PRIORITY_DEFAULT", "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "instance_group": [ { "name": "resnet50", "kind": "KIND_GPU", "count": 1, "gpus": [ 0 ], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "resnet50.plan", "cc_model_filenames": {}, "metric_tags": {}, "parameters": {}, "model_warmup": [] } I1225 18:21:14.591284 1 model_state.cc:272] model configuration: { "name": "resnet50", "platform": "tensorrt_plan", "backend": "tensorrt", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 0, "input": [ { "name": "input", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ 8, 3, 224, 224 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "output", "data_type": "TYPE_FP32", "dims": [ 8, 1000 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "priority": "PRIORITY_DEFAULT", "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "instance_group": [ { "name": "resnet50", "kind": "KIND_GPU", "count": 1, "gpus": [ 0 ], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "resnet50.plan", "cc_model_filenames": {}, "metric_tags": {}, "parameters": {}, "model_warmup": [] } I1225 18:21:14.593163 1 tensorrt.cc:288] TRITONBACKEND_ModelInstanceInitialize: resnet50 (GPU device 0) I1225 18:21:14.593203 1 backend_model_instance.cc:105] Creating instance resnet50 on GPU 0 (8.6) using artifact 'resnet50.plan' I1225 18:21:14.593220 1 instance_state.cc:256] Zero copy optimization is disabled I1225 18:21:14.669096 1 logging.cc:46] Loaded engine size: 50 MiB I1225 18:21:14.711327 1 logging.cc:49] Deserialization required 42173 microseconds. I1225 18:21:14.711366 1 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +48, now: CPU 0, GPU 48 (MiB) I1225 18:21:14.715443 1 model_state.cc:220] Created new runtime on GPU device 0, NVDLA core -1 for resnet50 I1225 18:21:14.715476 1 model_state.cc:227] Created new engine on GPU device 0, NVDLA core -1 for resnet50 I1225 18:21:14.716003 1 logging.cc:49] Total per-runner device persistent memory is 64512 I1225 18:21:14.716032 1 logging.cc:49] Total per-runner host persistent memory is 299552 I1225 18:21:14.716812 1 logging.cc:49] Allocated activation device memory of size 28901376 I1225 18:21:14.721627 1 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +28, now: CPU 0, GPU 76 (MiB) W1225 18:21:14.721651 1 logging.cc:43] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading I1225 18:21:14.721666 1 instance_state.cc:1797] Detected input as execution binding for resnet50 I1225 18:21:14.721669 1 instance_state.cc:1797] Detected output as execution binding for resnet50 I1225 18:21:14.722490 1 instance_state.cc:188] Created instance resnet50 on GPU 0 with stream priority 0 and optimization profile default[0]; I1225 18:21:14.722909 1 backend_model_instance.cc:806] Starting backend thread for resnet50 at nice 0 on device 0... I1225 18:21:14.723570 1 model_lifecycle.cc:815] successfully loaded 'resnet50' I1225 18:21:14.723688 1 server.cc:603] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+

I1225 18:21:14.723756 1 server.cc:630] +----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ Backend Path Config
+----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ pytorch /opt/tritonserver/backends/pytorch/libtriton_pytorch.so {}
tensorrt /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}

+----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1225 18:21:14.723789 1 server.cc:673] +----------+---------+--------+ | Model | Version | Status | +----------+---------+--------+ | resnet50 | 1 | READY | +----------+---------+--------+

I1225 18:21:14.769573 1 metrics.cc:808] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU I1225 18:21:14.769730 1 metrics.cc:701] Collecting CPU metrics I1225 18:21:14.769924 1 tritonserver.cc:2385] +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ Option Value
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ server_id triton
server_version 2.35.0
server_extensions classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging
model_repository_path[0] /models
model_control_mode MODE_NONE
strict_model_config 0
rate_limit OFF
pinned_memory_pool_byte_size 268435456
cuda_memory_pool_byte_size{0} 67108864
min_supported_compute_capability 6.0
strict_readiness 1
exit_timeout 30
cache_enabled 0

+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1225 18:21:14.778708 1 grpc_server.cc:2339] +----------------------------------------------+---------+ | GRPC KeepAlive Option | Value | +----------------------------------------------+---------+ | keepalive_time_ms | 7200000 | | keepalive_timeout_ms | 20000 | | keepalive_permit_without_calls | 0 | | http2_max_pings_without_data | 2 | | http2_min_recv_ping_interval_without_data_ms | 300000 | | http2_max_ping_strikes | 2 | +----------------------------------------------+---------+

I1225 18:21:14.784993 1 grpc_server.cc:99] Ready for RPC 'Check', 0 I1225 18:21:14.785606 1 grpc_server.cc:99] Ready for RPC 'ServerLive', 0 I1225 18:21:14.785635 1 grpc_server.cc:99] Ready for RPC 'ServerReady', 0 I1225 18:21:14.785640 1 grpc_server.cc:99] Ready for RPC 'ModelReady', 0 I1225 18:21:14.785645 1 grpc_server.cc:99] Ready for RPC 'ServerMetadata', 0 I1225 18:21:14.785650 1 grpc_server.cc:99] Ready for RPC 'ModelMetadata', 0 I1225 18:21:14.785654 1 grpc_server.cc:99] Ready for RPC 'ModelConfig', 0 I1225 18:21:14.785958 1 grpc_server.cc:99] Ready for RPC 'SystemSharedMemoryStatus', 0 I1225 18:21:14.785984 1 grpc_server.cc:99] Ready for RPC 'SystemSharedMemoryRegister', 0 I1225 18:21:14.785989 1 grpc_server.cc:99] Ready for RPC 'SystemSharedMemoryUnregister', 0 I1225 18:21:14.785994 1 grpc_server.cc:99] Ready for RPC 'CudaSharedMemoryStatus', 0 I1225 18:21:14.785997 1 grpc_server.cc:99] Ready for RPC 'CudaSharedMemoryRegister', 0 I1225 18:21:14.786001 1 grpc_server.cc:99] Ready for RPC 'CudaSharedMemoryUnregister', 0 I1225 18:21:14.786005 1 grpc_server.cc:99] Ready for RPC 'RepositoryIndex', 0 I1225 18:21:14.786008 1 grpc_server.cc:99] Ready for RPC 'RepositoryModelLoad', 0 I1225 18:21:14.786011 1 grpc_server.cc:99] Ready for RPC 'RepositoryModelUnload', 0 I1225 18:21:14.786015 1 grpc_server.cc:99] Ready for RPC 'ModelStatistics', 0 I1225 18:21:14.786019 1 grpc_server.cc:99] Ready for RPC 'Trace', 0 I1225 18:21:14.786023 1 grpc_server.cc:99] Ready for RPC 'Logging', 0 I1225 18:21:14.786046 1 grpc_server.cc:348] Thread started for CommonHandler I1225 18:21:14.786530 1 infer_handler.cc:693] New request handler for ModelInferHandler, 0 I1225 18:21:14.786575 1 infer_handler.h:1046] Thread started for ModelInferHandler I1225 18:21:14.786673 1 infer_handler.cc:693] New request handler for ModelInferHandler, 0 I1225 18:21:14.786719 1 infer_handler.h:1046] Thread started for ModelInferHandler I1225 18:21:14.786847 1 stream_infer_handler.cc:127] New request handler for ModelStreamInferHandler, 0 I1225 18:21:14.786891 1 infer_handler.h:1046] Thread started for ModelStreamInferHandler I1225 18:21:14.786898 1 grpc_server.cc:2445] Started GRPCInferenceService at 0.0.0.0:8001 I1225 18:21:14.787280 1 http_server.cc:3555] Started HTTPService at 0.0.0.0:8000 I1225 18:21:14.828943 1 http_server.cc:185] Started Metrics Service at 0.0.0.0:8002 W1225 18:21:15.774691 1 metrics.cc:573] Unable to get power limit for GPU 0. Status:Success, value:0.000000 W1225 18:21:16.775075 1 metrics.cc:573] Unable to get power limit for GPU 0. Status:Success, value:0.000000 W1225 18:21:17.779516 1 metrics.cc:573] Unable to get power limit for GPU 0. Status:Success, value:0.000000


2.) sending request from Client for Inference.

this is my code for request to triton server.

import numpy as np import tritonclient.http as httpclient

triton_client = httpclient.InferenceServerClient(url='localhost:8000', verbose=True, network_timeout=60.0)

def test_infer(model_name, data):

inputs = []
outputs = []
inputs.append(httpclient.InferInput('input', list(data.shape), "FP32"))

inputs[0].set_data_from_numpy(data)

outputs.append(httpclient.InferRequestedOutput('output', binary_data=False, class_count=1000))

results = triton_client.infer(
    model_name,
    inputs,
    model_version="1",
    outputs=outputs)

return results

if name == 'main':

data = np.random.rand(8, 3, 224, 224).astype(np.float32)

res = test_infer('resnet50', data)

print(res)

(base) D:\github\cnn_classification_flower\deploy_triton>python client.py POST /v2/models/resnet50/versions/1/infer, headers {'Inference-Header-Content-Length': 198} b'{"inputs":[{"name":"input","shape":[8,3,224,224],"datatype":"FP32","parameters":{"binary_data_size":4816896}}],"outputs":[{"name":"output","parameters":{"classification":1000,"binarydata":false}}]}0ap?\xb9\x9f\xd7>G\xfe\xc2>\xa6\x8df?\\x98\x8c>\xcenv?\xa3.7?\xd5\x16:?\xda\x96\xc3>\xffQ\xbf>\xc3Mq?\xb9\xe2~<\x04\xf23?\xad\x8bB?YY>\xa3?G\xa9\xbf>\x8fY\xa9>\x90\xe1\xa8>,\x87\xa1>F\xd5\xaf>\x8f\xd2\?\x18i6>\xafB\x7f?\xb6\xa6;\xe7D\x80;L/\xc8>\xfa6B>\x17\x9b\xaf>\xf0\xd5\xdf=\x81\xcc\xe1>\xa1r\xe7>3\xf1g?\x90\xa8@?\x98/\x86\xf0\x97\x82>|\x94\x1a?u\x1c\xc8u\xc5\xb8>m\xd2:?\x8b0@?\xd0\xec9\x99%D><\xb15?\nq<>\xe7$O>\xb7\x02\xba>bJ\x03?\xa8.4?\xcf\xec\x03?\xba\xe9\xad=\xd8\xfa?\x11\x18B?\x8c\x8b\x10?\x96\xd6m?\x1e\r\'?3\xc4\x04>\xaeQ\xb1\xfbz?\x9c\xd2\x91>\x9a\x01\x8c>\x9b8\xa9=\xaf\xe8\xd7>\xf3\x109?\xf7\xcb\xde>4\xc2Z?\x8c\xc2A?w\xcb$?sD\t?\x04\x1b\xeb>\x044k??\r\x0b?V\xd4\x17?1\x05T?\xa5h\xd1>\x07hy?L$\xd8>\xa4\x0b\x1c?\\\x1b{>2\x0e\xec>\x1a\xd5A?>\xa6\x14?\xd9^C?\xc1\xc9\xca>o\'q?\x9e\xea\x9a>\x06\\W?\x87\xf47>\xcf\x82:>\xb2\x0eF?\xb7ez?i\xb3R?N~\xba>^\x1c\xdc>\x12Sa>\x11\x1b\x18?\xa4C8?z\xdd\t?\\\xec=?\x9dy@?kT\xda\xa8\x84t>\xdf\xf9\x15?\xa9\xff>?\xe4\xc4\x1d?K\xa2!?\xd7\xc6I?\x18h?\x7f\x07\x1e?\xa2\x15S?\x95\x88a?^"]?\xb6\x905?5\x8d\x04?V\x81z?u\xc8o?\x14\xd4}?\xc7\x85\x07?\xd6\x80"?t\xcdN?\xeb+\xa7>\xf8)\x0c?k0"?\xab\xa8\xf5>\xea\xbe\xb8>\x94\xbd~?\x97\xe1\xc4>\xa72O?\x03\xb6R?e\xa4\xc5>+\x7f\?\x8d\xe2\x0e?\xbf\x7f?B\xe8\x19?q\xd1\x05?l\xdf\xf7>rc\x11?\r\xa2K?\x81,\xa3>\xeeMz?\xb9\x86\x08?\xb2\xf8g?,\x04[=\x11\xda\x07?\xecyJ?\xb9Vw>\x99u\x06?\x8c\x7f\x1c>K\r>\xe4\xda\xd3=\x01E\xe9>\xea\xf7\xe9;\xd0\x95\x1a?\xfbJ+>\x8e\xf9A=>B\x1d>\xca\xaf\xa8>%\x827?X\x11\xad>\xe2aK?\x07Bu?\xd5\xd9=>\xaa\x92H?\x9d\x88\x9e>\xe5Xz?Z\\N>\xd1\x88J?|\xd7"?\xa6\xd5C>\x14\xf7~?W\xddD?\xf5\xa6\x0c?\xef\xb1\xea>\x10\xa2V?9\x03 >C\xc5\x81>\xda\x88\x1e?\xce\\\xe2>\xf1\xa8\xa4>\x90\xa0\x00>{FH>\xb6Mx>\xf8\x90\xc9>\xed\x1a\xa1>\x02\xb1\xda>cx\x04?\xc2=q?\x11\xbe\x1d?-aY=\xf7\xf7[?\xdd\x05\xbf=C\xbfS?\xf4\x91\xfa< \x83\xfb=\xbe\x12f>}\xc3\xa1>kZ\xf4>s\x8eo?L\x8f\x18?\xab\xca\x80>\xbay\x81>\xabD\x11?\x18\x0e\xbc=D;$?\xcfG9=\xb0(O?8\x15-?\xe5\x14e>\xa1\xa9;?\xdf\r@>l\x11c?\xe8\x7f\xbd>\xe5\x93#?\xb7\x19\xc1>\xa3]\x11?+3\xbb>\x84a\xdc>\xbe\x9fv?\x97s\x1f?\xbe\xee\x02>\x1f\x01+?C\xd2+>Q?*?%/\xbd>[\xed\x8b>L\x86O?\xb5\x9a>\x8e\x06\x06>c\xe9F?\x13\x1bS?m>;?\x15\xb3\x1c?ku2?<\x0e>\x10\xacx>\xfd\xf2\x02?!\xe6t?\xc3h5?f\r\xf0>\x17N\x00>m\x1e.?\xb2e\x11>\xf6\x15\xb0=r\xb0q?\x96\x90\x8e=\xd7Q>g\x07\x93>\x12\xc4>\x92\x87W>\xa5 \xd7>\xa6m\xdd>\xe5\xebJ?~\xa6v?%S\xed>\xdc}r>\x08\xc2\x87>\t\x88\xb4>M(9?\x0b}\xa4>^h^?\xb8\'\xcc>\xc1Q[?\xce\xc9]?\xc5\xb7\xc3>as\n?\x88i\xb2>\xa9\xb0x=ZaT?\xcf\x83f>\xc3dg?)\xb2j=Z\xaa+?\xf2\xf4\x87>A\x98\xdb<\xf2\xc4M>\x075W?MLZ>\xe6I0?~\xcf&?\xa46(?\xbe\x19\x1e?\xfb%\xca>\xf7\xcc\xd4>\x14;\t?\xc9Ua>Bk\?\'\x9f:?\x95\x11P?\xb1Bj?F\xd7\x05?<\xdd\r?\xbd\t\xd9><o\x07?\xd1p/?\t;\xd5<\xde\x82F?{\xa2\xe8=\xf2N\x0c?\xaf\x9b\x14?\t\x94\x11=C\xa4M?\x13\xff^?hY\x87>\xee\xd9>%K\x9c<wp<>\xdf\x91G?\x17Up?\x13~T?\x8d\xd7\x1c?\x8b#\xfe>\'\x9f\?O\x0cv?\xc9\x0c1?\xe7\xeb;?\x00b\xc9>\xf0\xa90?R\x03g?7q\x0e?\xaf\xbf\xfa>\x1b \xc7=\xf4w;?\xa6B?\xb5\xb6d?\xc4S\xf6>\x0e\x9az?$\xe6|?\x0f\xc1\xf4>\xea\xa1\xfd>\xaa\xf2\r?&g\x17?\x80\x15w?P\xdfI?pc\x19>\x9f>@?\x9an\x01>\x8c\x1fY?\xaa\x1e\x1a?.ql?g\xbbR>e\xe0r>\x91\xadK?u/:?\x12Aj?\xdb[/?\x0fN\x98>\x8e\xbe7?.p/=s\xfee?\xbd\x07/>mVL?\x90\xce@?\xfa\x12\x19>0\xe21?\x96\x04Y?"\xd4\x87>>\x1f\x05?\xecu\x19?C\xfca?R&\xc6=\x86t\x11?(?4?\xb7#\x12?\x8f\x1f:?c\x9d\xc1=\xc2\xd8\x97>U\xc5\x17?\x00ro=\xfe\x08y<1.?\xb2\xae\x1d?\x9a\xbb\xa6>\xdd\xea\x13?\xd2\xb3N>*\xb7l?)LZ?\xd1\xd1w>\xc6a\xc3>\x9e\xaf9<\xd9f\xd1>\xc7xz>o\x82t?\xb9h:?w#%?\x98\xfc;?\\x0b\xd3>\xf3\xdf\xc3>\xa6\x7f\xa6>\xf1\xabT?\xc3\x80>?Z\x05\xa4;\xf3\xa1\xde>(JD?\x98>\x8e>\x07\xce\x04?7\xf1D?j\xe7\xb2;xPf?Cx\x1a>w\xe3\x0b>ga<?\xbc\x80e?\x82\xcc\x1e?\xd4\xa0\xec>0K\x05>\xa9\xaam>\xda#(?\xee\xdd\x90>$\xc5~>:\xa8?b0q?\xaa\xd4:?o(Y>\x9d\xfcs=\xc8\x89d<\xac|I?\xc4\xd5X?\xdb\xc1i>\xdd,T?\x13\xbc\x00?\x85\x96\xb0>\xbf\xe6\xe8>1\x13O?Q/R?\xed\xb8\x03>j\x16\'>\r\xe6\xa5>\x8f\x80?\x02h";[%\xb9>V\xabZ?\xf6o\x13=0v[?b\x94w?^\x03\xba>\xa23\xa3>\xbah\x15?\x9d\x1f\x1a;\xc2RC?=\x07\xe8==+\xfe=U\xca\xa3>\xd2N\x0b=\n0 ?\xbb\xe6\xf0>\xdcE,?\x12\xc2I?\xac\x04)>!\xbat>\xd8\xc6\xc8>\xb0u/?\x03\xdc!?\xfa\x96\xee>?\xd1\x8c>\xa1t\xb7>\xa50p? ??\xf1\xe8\x1a>Obt?\x13fx?\xea\x83)>N?\t?\x1c-k?\x9e\x07,?\xb8\x06\x92>\xf1\x94\x05?\x14\x86L=\x96\xb0\xc0=(\x1d\x10?brM?\x87$\xa1=&\xc2%?#\xd1^?98\xb9=\xb0\x1b\x8b=\x93W\xaf>\xe0I\xde>\x9b\xff&?\x1f\xe1\xda>\xcd\xd6\xb3>dSV?\xfa,-?\xe5\xda\xd9>\xd2\xc0\xd6>H\xa7Q?(\x1b\xb7>\x1f\x9ey?N\xc0A>\x14p\xa8=3\xc2&?\'N,;\xb3\xf3">\x17\xfbQ?\xa7aU?\xa7\xden>\xfa\xb4$?X\xf1\xab=\x06\x7fd>\xcbvL?\xf7h\x08>\x18\xb2\x85=\x14}G?\xc2\x94*?\xc6#]?\xc7&>?\xb6\nm?\xbe}\x92>\x8b9m?\x7f\xce\x1b?@Kx?\xc1\xd1\xea\xbf\x8a=>\xb1\xafj>\x87\xac\xe2>P\x9d~?\xe84\xdb>s+~>Q\xba->\xec\x81N?[v\xc9=h-(?\xdf\xcec?\xc8mW?\x98\x1c?hiC>\xd7|\xaa>\x17#f>U;?\xc0\xb2e? \xfd\xdb>?\r\xd8>\x13!_>\x92\x0ed?1\xa7b=-=\x1a?\x08\xfb\x9b=\x0b\x070?P\x10+?\xd4\x98Y?\xea\x91\t>\xed\xa7K?vBJ?\x89\xec%?\x00\x90"?@e\x1e?\xb4\x00n? \x0e\x02?\x8a\x9a\x13?\xc8\xb4f=9\x82\x11>i\x1dj?_\xddc?<\xber?\x82\xd7\xba>\x9dp\xe4=\xf1|]?f\x14\x9b>\xba\xd9\xa3<\xc0S\x98>\x02\xe0G>C=$>I\x88\x16?y\xe2\x1f?\xc8E\x9c=Y\x0b\x88>%\xe5U>h0\xd4>F\xf1!>0\xb5O?H\xe80?\x00\n&?\xb2\xe7\x88>\xbe\x92&?BQh?\xc28\x04>&\xd9\xba>\t\xb0X?\x1c\x9c\xdb>C\xc6\x08?\xae\x16\xf6>C\x0b\x15?\x8cC\x9c<\x9f)A?\x11sd?\xeep.?\xb6\xb8$?\xee\x0c\x8d>\x9d\xc6[?\xba/\xb0>\x93\x1e(>\xac\x87>?\x85B.>\xfav>\x926s? H >""S?4\xa3X??\xf01?\xbc\x0c+?\xa1\xb9\t<\xd4\x9fu?\xcf\x84\x84=k\xe4v<9\x11a?gXr?Vp<?C\x08a=R\nf?U\xc5\xe0>e>j?S\xe1\x9b>\x95\x92\xfb<\nF?\xb1\x19M?&\xae!?u-5?\x08zr?P\xf9*?\xa0v*>\x9fK\x1d>\xaf\x83n?L\xf45?\x009??9\x89L=]j\x06?/e\x00?T\r\xe4<\x00G\xf3>a\x80\xb1>I\xd4}=+)4?\xe4\x16\xa9=f\x8f;?3\x11\xc7>x\xf9F?T\x04\xc9=\xf5\xcb\x94>\x94\xb3\x95>R\x95&?\x88\xcfr>\x89\nW?n}\x10>QK\x96=\xb2/Q?aU;?O\xd7\xf8>\xc0D\xea>\x1b\xfb\x13?l\x95\xb0>\xe7\xa2J=\x8b\x81\xbe>#\x9b\xeb>\xec\x9a\x91>5~\x91>\x9fa5?\xd8\xb0\x17?\xa4\xb0\x15=\x7f\xbf >c\xe7B? \xa0?\xb24>>2\xed\x0f?.\xba;?\xdc\xf7\xfe>\xac\x92[?\xd9\xe8\x10?\x9a\xeb>\xe4\xfc\xbb>\x1f\xb4Q?tf\xff>\x14m}?\x9b\xfe2?uw\x02>\xe9\xbeo>w\xbb2?\xdbTv=\xf1\x93<?\xca;w?d\x08\xa1>o\x13\xde><\xf9h?\x8eMK?U)r>\xca\xd5\xeb>V\\xb2>B\t\xfa=xO7?.\xd0|>\xc4\xe12?\x8e\xfcj=F\xc3\xf8=|L\x0e?\xda\x02Z>Km\xfe>/Mt>\xa8E\t?2(\x0e>\x10\xc0\x18>>\x85z?v\xfa7?\xd3\xd6I?"\x1aE?\x87U\xf4>\xd3\xad\x1b?S\r\xdf=\xf4!\xb5>\x99\xf9\xd0>\xae\x1c\x1d?\xe8\x973?S\xe3\r?\x0f\x833?\xa5\xba\xc3>\xe8\x92e>s\xcf\x1c?\xab\(=\x13\xba\xd8=6\xf2\xaa>Ad\xe1>O\xacF?\xb8\x9e??\x02\xc4(>N)\x85>\x92\xd1\x04>\xce\xb4}?,\xb5\x8d>\xf0\x9e?\xb0\x9e\xa9>\x80|\xea>\x10\xda\x08??(\xba>SZ\xa4>\x9ae.?a\xf7\xc7>\x1f\xb2a?a\x0b\xbd=\x00\x80>\xcc\x143>U\xaf/?\x95\xf5:?V\xa1\x17?\xfch\xc8>\xbfTF?2\xb9\xd0=\xd5\x01O?\xe3\x03\'>;\x07f?tL\x9b=b\x8d\x08?e\xab}>\x88\x9c\xbe>shP?1\xeeO?\x93\x97I?\xf7\xb8a?\xeb\x88\x81>\xce\x1d\x87>pOs?t\xe1(?\x8a\xd0M>r#\x9d>Q(j?+$f?\xb7j\xfe>\x8aK\x88>\x19\x08\xf4>B}\xa3=\x85\x03R?\xd8\xbf\xa9>\xc0U\xc1>g\x9eO>2\xf2\x93>N\x0c\x1a>;j\n?A\xaa\x8a=/:d?Z\xa4|?He\xe7:\x98?=?\xb0\xf6\x9c>\xbf\x8dr>\xb2#\x15?<\xad\x8b=j\x83\x80>3\xc7\xe0<\xcaj\x9b=\xc3j{?\xf8\xd16?\x8b\xcat?\xd2\xd72?]>R?\x1c\xbav?\xf6\xd3Q>\xe1\x06U?\xdc\x9e\x86>\\\xfc\xdf>\xe1\x1d\x93>\xf8\x14\xa6>\xebk\x0b>\xd1$$>S\x01Z?\xe1e\xc7>\x9f/E?K\xfa*>%\xe1\x0c> \xc5\xac>\x01\xb4:?\xaf\x9d$?\x8d\xbe\xb7=Z\xad@>\x0b\x88.>\xd9\xd1q?l3y?\x169\x05?\'2G>\x8d\x17\x04?\x0fP-?s\x8d\xbe>\xd0\xd1\x83>uHS?JD\x98>\xdf\x9a\xe4>\xec$D?!\x1bz?\x12\xcf\\?]\x8f\xfa>G%\x1a?\xe0Q\x1e>\xc1a\xf6=Y\xa5\xad>2\x98\xcd=@S1>\xee\xf1\xb0>\xdd\xe7\xe6>^Z\n>o\x81v?\x12|\xf5=~\x07\xf2>"\xed\xfa>\xc9gI?pQ?\xd4\xe0O>\A\xa2>c\xe9\xc7>\x94\x84F?g\xbcQ?\QF??\x19\xf2>\xa2E\xcf<\xda\x91,>s\xac\xf4<\x8e T?\x81sx?\x89\xd6\xa3>\xc8^t?|\xea\x87=\x07IF?\x92w\x06?\xad\xfe\'?\xc2\x01\x10>\xecJ\x03?\x8b\xf7B>p\xc1\xcc>\xd3\xcaA?n\xb4\xd3>\x9a\xa0\xcc>\xb8\xa7\x8f>\xbb\xaaX=\x8f\xcal>\x93z]?6\x8e\x8d>\x83\x17Nr\x83"=\x04eQ?\xd7\xb0G?"M\xd3>\xd0A)>@\xbeLu"\x06?x%\x13?E\xb1$?\x189\xec>\xe4\r\x14?\xe7\xd2\xd8>\x82\xbdc?\x8cpe?\xc2\x11\x0f?\xf4\x97\x9b>B\x0b[?\x92}7?\xf8\x98Y?V\x06\x16?\xe1{\xed>2?\xc1>\xd0-V?|]\x90>p\xbd\xc3(\x9b(?\xafej>P{\xea>\x9c\x92\xef>\xa8\xde\x0e?n\xf3Y?\xf95\x9d>R\x9f\xe7>"\xa0\xc9<e\x12\xbe>\x88]\x16?)`F=\x96I=?\xb2\xd7\xc1>\x91;\xb0>\xa6\xec\xed=\x9f\xbf\x12?S(\x07>\xa2\xb7<>\x13ob?\x80jW?\x03\xe03<\xd3\xff\xcb>\x05}\x08?e)w?\x02L@?\xa2B\xfef,F?i\x97r>\x98|>\xa1"S?\x1a\x81T?\'r?\x95\x89{?\xccZ\x1f?\xc7Se?l\xaap?HH?#\xaf\x1d>(n&?\x11\x80\xdf=\xb2\x8dm?K\x9e\xd9=\xb2D\x90=\xa9\xa3"?\x9e\x07\xde>\xc4\x1d2?\xa8(\x8a>\x11\x87\xaf>\xbf\xde\xe3>\xcan\x86>\x98\x05\x0c?\xc6D?\r\xdb(?Mx=\xadZo>at\xb6>r\xad$?Y\xf9J?s\x9d\xca=\xc8\xc78=\xe3\xb0(<\x1a\x1f\x1a>\xb0\xe3\x18?e\xfd\x1f>\x8a\xb3\x1e=\x86\xdc\xf5>\xd0%\xa9=&\xef\x8f>GW\x17?\xc9\x00 >k;\x1a?\xfb3\xe6<\xec\xd8V?g\xae\xa6>\xb4\x0cY>U0g?\x88\x98Q=\xffJG>\xc7\xa3@?Q\x06T>*\x13\x8f=w\xff\x05?\x93&d?\x9ce}?F\xecI?v\x05v?J\xf5o?fl$>%\x1d\x00?\xfa\xacT=(QE?\xe2\xdd\x0e?\x96q\x95>/H\x8e>:\xfc\xee>\xdc\xd31?\xb2t\x18?j\x80Y?\xb9O\xae>N\xc4\x1f?:Q\x99>\n\xcd\xc1>\xdcA1=\x81\xc3"?r\x8a\xbe>\x7f\xd6\x8c>wh\xf2>\xdat\xad>\x96\xb3\x01?\x96G7?\x96"\x11;)\xb9-?\xc01\xca>\xaf\x97\x1d?\x84\r\xdc<\xfcha?\xf8\xbeS?\xb2\xf9G?\xd2\xb3\x8d>\xd7u\xc5=km\xa0>\xd70D?^\x89\x04?hO\xdb>\xcb\x1e\x05?\xa1\xf2\xff>\xca\'\x0b?Ys3>6\xdb\xb6>A\xb7\xd9>|#\x11?\x82c\x81>\x12%j?e\xdc\xff>\xee\xf4!?\x15ga?\xb2\xe2\x03?\xd1\x0c@?/[E>\x93u\x16>\xd2\xd4\xf5>\x08\xf2\r?\xfd\xb3\x07?-\x1d\xd5>2\xed\xcd>kU\xe7>\xd2\xc3I?V\x05\xd3>8o\x92>\xe9\x9e\xb5>\xe0y\xa6>Z\xde$=\x179\x1d>\x05\xda\xf1>\xe4\xaa\xa5>\xf5\xa7]?\xc7\xafk?\x1d\x1c\xbd>\x18\xe9\x1a?\xa3\xbe2?\x04\xb4i?\xee\xb7\x1b?<\xe1Z?4\xbd\x13?\x9d\xe87>\xb5\x00?Y\xbcf?\xbd\xe7\'?\xabD\xb1>d|\xf0>\x9d\xb0^?2\x88m?\xe0CK?On}?\xfeu\xb6>M\x0b\x8e>\x026\t?\x92\x81\x03?E\xd0\xef>P\x0c\xac=\xb0\xcf\xf9<\x10\xa8\x8a>\xe2\xf5\x13?\xcc\x14\xef>&\x87\xe1;\x86\xe0;?\x86\x87\xab>PN\x08> PW?R\xfar?\x02w ?\x03\x16\xce>\xd5[^?W\xd8\xc8>)k\x1d?\xfcHI?\x08\xd9\xf6>\xde\x07t>\xf8\xb4\x1d>\xe9\x0c4?\xea\n/?(\xa9L?\x98\xef\xc5>~\x1b\x0e?\xd1\xc6h>\x82\x8bV?3\x97/?\xe2\xd3\xae=\xa5\xfce?\x83\xeau?\x9di\xa3>\xc4\xd5<>d0\x9a>\xd4\xd6k?\x88I\x0b?\xed\xbc4>\xcf\'\x87>"\xe3`?\xa4\x93+>\x16\x1dH?,9=>\x14\x7f9>\xe2\xe1K?\x9c\xbd\xe5>\xa2\x1a ?se\xf8>\xf4\x86p?\xb3Ph>2rM?\xd8"\xfb>\x9f\x9fj>\xec\xb1c>\x19\x87\x8c>\xf0\x87\xda;\xa3\xb7\xf1>\xc0\x02V?\x8c\xea\xdd>\xe9\x1d\x8b>\xf5\x06\x00?9_S>\x1f\xac8>\xa7?\x07?w\xb8l?o\xcb\xb5>\x99%\xc1>"7\xfe>QR\xb0>\xee#\xa7>t\x815>F\xc1\xf1=\x92\x14\xf1>@|\x08?\x8d\x93(?\xa7SP?\xc5\x1d\r?X\xfcu>V\x87\x9c>\xf2\xe8\xe2>\xa8XU>\xc3


then this is responding from server.

I1225 18:24:49.751096 1 http_server.cc:3449] HTTP request: 2 /v2/models/resnet50/versions/1/infer I1225 18:24:49.753654 1 infer_request.cc:751] [request id: ] prepared: [0x0x7f3e6c004a50] request id: , model: resnet50, requested version: 1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 0, priority: 0, timeout (us): 0 original inputs: [0x0x7f3e6c0050b8] input: input, type: FP32, original shape: [8,3,224,224], batch + shape: [8,3,224,224], shape: [8,3,224,224] override inputs: inputs: [0x0x7f3e6c0050b8] input: input, type: FP32, original shape: [8,3,224,224], batch + shape: [8,3,224,224], shape: [8,3,224,224] original requested outputs: output requested outputs: output

I1225 18:24:49.761740 1 tensorrt.cc:381] model resnet50, instance resnet50, executing 1 requests I1225 18:24:49.762179 1 instance_state.cc:360] TRITONBACKEND_ModelExecute: Issuing resnet50 with 1 requests I1225 18:24:49.762199 1 instance_state.cc:409] TRITONBACKEND_ModelExecute: Running resnet50 with 1 requests I1225 18:24:49.796027 1 instance_state.cc:1437] Optimization profile default [0] is selected for resnet50 I1225 18:24:49.796391 1 pinned_memory_manager.cc:161] pinned memory allocation: size 4816896, addr 0x205000090 I1225 18:24:49.810520 1 instance_state.cc:900] Context with profile default [0] is being executed for resnet50 I1225 18:24:49.839556 1 infer_response.cc:167] add response output: output: output, type: FP32, shape: [8,1000] I1225 18:24:49.839613 1 http_server.cc:1101] HTTP: unable to provide 'output' in GPU, will use CPU I1225 18:24:49.840054 1 http_server.cc:1121] HTTP using buffer for: 'output', size: 32000, addr: 0x7f3ea4008ec0 I1225 18:24:49.840071 1 pinned_memory_manager.cc:161] pinned memory allocation: size 32000, addr 0x2054980a0

after this it stuck, if not define timeout it will forever.

and this is the client's logs after timeout.

Traceback (most recent call last): File "D:\github\cnn_classification_flower\deploy_triton\client.py", line 28, in res = test_infer('resnet50', data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\github\cnn_classification_flower\deploy_triton\client.py", line 16, in test_infer results = triton_client.infer( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ADMINS\miniconda3\Lib\site-packages\tritonclient\http_client.py", line 1462, in infer response = self._post( ^^^^^^^^^^^ File "C:\Users\ADMINS\miniconda3\Lib\site-packages\tritonclient\http_client.py", line 290, in _post response = self._client_stub.post( ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ADMINS\miniconda3\Lib\site-packages\geventhttpclient\client.py", line 272, in post return self.request(METHOD_POST, request_uri, body=body, headers=headers) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ADMINS\miniconda3\Lib\site-packages\geventhttpclient\client.py", line 253, in request response = HTTPSocketPoolResponse(sock, self._connection_pool, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ADMINS\miniconda3\Lib\site-packages\geventhttpclient\response.py", line 292, in init super(HTTPSocketPoolResponse, self).init(sock, **kw) File "C:\Users\ADMINS\miniconda3\Lib\site-packages\geventhttpclient\response.py", line 164, in init self._read_headers() File "C:\Users\ADMINS\miniconda3\Lib\site-packages\geventhttpclient\response.py", line 184, in _read_headers data = self._sock.recv(self.block_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ADMINS\miniconda3\Lib\site-packages\gevent_socketcommon.py", line 666, in recv self._wait(self._read_event) File "src\gevent\_hub_primitives.py", line 317, in gevent._gevent_c_hub_primitives.wait_on_socket File "src\gevent\_hub_primitives.py", line 322, in gevent._gevent_c_hub_primitives.wait_on_socket File "src\gevent\_hub_primitives.py", line 313, in gevent._gevent_c_hub_primitives._primitive_wait File "src\gevent\_hub_primitives.py", line 314, in gevent._gevent_c_hub_primitives._primitive_wait File "src\gevent\_hub_primitives.py", line 46, in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait File "src\gevent\_hub_primitives.py", line 46, in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait File "src\gevent\_hub_primitives.py", line 55, in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait File "src\gevent\_waiter.py", line 154, in gevent._gevent_c_waiter.Waiter.get File "src\gevent\_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch File "src\gevent\_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch File "src\gevent\_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch File "src\gevent\_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch TimeoutError: timed out


I've tried check health via 

curl -v http://localhost:8000/v2/models/resnet50/versions/1/infer/health/ready

it's show "400 Bad Request".

I'm newbie about Triton Server Inference and I stuck here about 2-3 days. Can someone help me about give me out of here 🤦

rungrodkspeed commented 9 months ago

OK, I solved it.

model.plan which export from tensorrt Python API not work for me.

I solved by use command !trtexec which call straight from tensorrt backend.

but I don't know about reason. I could solve it by thinking out of nowhere.

if someone can explain it. you can describe it for me.

rungrodkspeed commented 9 months ago

I think I already know.

Before I exported model.paln was not the same version as the backend running on my triton server backend.

wish my issue is useful. 🤗