triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.15k stars 1.46k forks source link

The model got an issue with the OpenVINO Backend. #6747

Open chiehpower opened 9 months ago

chiehpower commented 9 months ago

Description

Hi all,

I have an IR model. I was trying to deploy it on Triton server v23.10. However, it encountered this error.

Warning: '--strict-model-config' has been deprecated! Please use '--disable-auto-complete-config' instead.
W1228 08:29:31.708605 1 pinned_memory_manager.cc:237] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I1228 08:29:31.708671 1 cuda_memory_manager.cc:117] CUDA memory pool disabled
I1228 08:29:31.709538 1 model_lifecycle.cc:461] loading: openvino_model:1
I1228 08:29:31.714584 1 openvino.cc:1345] TRITONBACKEND_Initialize: openvino
I1228 08:29:31.714617 1 openvino.cc:1355] Triton TRITONBACKEND API version: 1.16
I1228 08:29:31.714631 1 openvino.cc:1361] 'openvino' TRITONBACKEND API version: 1.16
I1228 08:29:31.714664 1 openvino.cc:1445] TRITONBACKEND_ModelInitialize: openvino_model (version 1)
W1228 08:29:31.729751 1 openvino.cc:752] model layout for model openvino_model does not support batching while non-zero max_batch_size is specified
I1228 08:29:31.729823 1 openvino.cc:1470] TRITONBACKEND_ModelFinalize: delete model state
E1228 08:29:31.729845 1 model_lifecycle.cc:621] failed to load 'openvino_model' version 1: Internal: openvino error in retrieving original shapes fromoutput valid : get_shape was called on a descriptor::Tensor with dynamic shape
I1228 08:29:31.729863 1 model_lifecycle.cc:756] failed to load 'openvino_model'

I also tried to write a config.pbtxt file. Here is the content. I still got the same error.

name: "openvino_model"
backend: "openvino"
max_batch_size: 1

instance_group {
  kind: KIND_CPU
}

parameters: [
{
   key: "ENABLE_BATCH_PADDING"
   value: {
   string_value:"YES"
   }
}
]

Not sure whether it was because of config file not correct.

Is there any suggestion?

Triton Information Container. Image: nvcr.io/nvidia/tritonserver:23.10-py3

Tabrizian commented 8 months ago

@tanmayv25 any ideas?

dyastremsky commented 7 months ago

CC: @tanmayv25

BigVikker commented 7 months ago

I got the same issue, The best answering is that openvino-backend doesnot support dynamic batchsize yet, keep that in mind. i Got 2 solution. Solution 1: you have to custom your backend. Solution 2 (Short-term): Cause backend doesnot support dynamic