triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.05k stars 1.45k forks source link

VPU support for OpenVINO backend #3930

Open mattdibi opened 2 years ago

mattdibi commented 2 years ago

Is your feature request related to a problem? Please describe. OpenVINO backend currently supports inference only on CPU devices using OpenVINO CPU plugin.

Describe the solution you'd like I would like to perform inference on VPU devices (Myriad/Myriad X accelerators) using the OpenVino backend through Triton server by specifying the target device through the model configuration like:

name: "mymodel"
backend: "openvino"
max_batch_size : 1
version_policy: { all { }}
input [
  {
    ...
  }
]

output [
  {
   ...
  }
]

instance_group {
  kind: KIND_VPU
}

Describe alternatives you've considered Running models through OpenVINO directly or by switching to OpenVINO Model Server

Tabrizian commented 2 years ago

cc @tanmayv25

hhh111119 commented 1 year ago

hi may I know is there any future plan to support vpu? thanks.