Open mayani-nv opened 2 years ago
@askhade Do you have any insights into the error? The error does make it look like an issue with ONNXRT/openvino integration issue but the model seems to work with python frontend of ONNX-RT with openvino EP.
From the error message it looks like it is unable to get the input "input_ids:0". Maybe some issue with input mapping not sure... needs investigation. How urgent is this?
This experiment was done as a part of Model-analyzer integration with onnxruntime's OLIVE tool. The ask was to see how can the ORT hyper-parameters(backends, precision etc.) can be sweeped using MA
@askhade I tried with Yolov2 onnx model and the Openvino backend seems to be working fine. It is only with the BERT onnx model that this error persists. Also, I tried to run with 'ORT-cpu only' backend for my BERT onnx model by commenting the following lines in my config.pbtxt
#optimization {
# execution_accelerators {
# cpu_execution_accelerator {
# name: "openvino"
# }
# }
#}
I get the error
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:21.06-py3-sdk
root@AMLTritonTester:/workspace# perf_analyzer -m bert_onnx_cpu --concurrency-range 1:4
*** Measurement Settings ***
Batch size: 1
Using "time_windows" mode for stabilization
Measurement window: 5000 msec
Latency limit: 0 msec
Concurrency limit: 4 concurrent requests
Using synchronous calls for inference
Stabilizing using average latency
Request concurrency: 1
Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests.
Thread [0] had error: onnxruntime execute failure 2: Non-zero status code returned while running Gather node. Name:'bert/embeddings/GatherV2' Status Message: indices element out of data bounds, idx=-1420042007188224409 must be within the inclusive range [-30522,30521]
@mayani-nv BERT is data-sensitive model. perf_analyzer by default use random data to fill in tensors and model might not like that. You should be able to probably get it working by providing realistic data as json in perf_analyzer or providing - z
like below:
perf_analyzer -m bert_onnx_cpu --concurrency-range 1:4 -z
@tanmayv25 thank you for the suggestion. So for the ort-cpu only backend, providing -z
option helped and I am getting the following
/perf_analyzer -m bert_onnx_cpu -z --concurrency-range 4
*** Measurement Settings ***
Batch size: 1
Using "time_windows" mode for stabilization
Measurement window: 5000 msec
Using synchronous calls for inference
Stabilizing using average latency
Request concurrency: 4
Client:
Request count: 19
Throughput: 3.8 infer/sec
Avg latency: 1011611 usec (standard deviation 215023 usec)
p50 latency: 1057326 usec
p90 latency: 1312771 usec
p95 latency: 1315162 usec
p99 latency: 1315297 usec
Avg HTTP time: 993732 usec (send/recv 60 usec + response wait 993672 usec)
Server:
Inference count: 24
Execution count: 19
Successful request count: 19
Avg request latency: 993315 usec (overhead 43 usec + queue 306508 usec + compute input 41 usec + compute infer 686683 usec + compute output 40 usec)
Inferences/Second vs. Client Average Batch Latency
Concurrency: 4, throughput: 3.8 infer/sec, latency 1011611 usec
However, doing the same with the ORt-openvino backend still gives the same error
./perf_analyzer -m bert_onnx_cpu -z --concurrency-range 4
*** Measurement Settings ***
Batch size: 1
Using "time_windows" mode for stabilization
Measurement window: 5000 msec
Using synchronous calls for inference
Stabilizing using average latency
Request concurrency: 4
Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests.
Thread [0] had error: onnx runtime error 6: Non-zero status code returned while running OpenVINO-EP-subgraph_5 node. Name:'OpenVINOExecutionProvider_OpenVINO-EP-subgraph_5_1' Status Message: Cannot find blob with name: input_ids:0
Thread [1] had error: onnx runtime error 6: Non-zero status code returned while running OpenVINO-EP-subgraph_2 node. Name:'OpenVINOExecutionProvider_OpenVINO-EP-subgraph_2_1' Status Message: Cannot find blob with name: input_ids:0
Thread [2] had error: onnx runtime error 6: Non-zero status code returned while running OpenVINO-EP-subgraph_2 node. Name:'OpenVINOExecutionProvider_OpenVINO-EP-subgraph_2_1' Status Message: Cannot find blob with name: input_ids:0
Thread [3] had error: onnx runtime error 6: Non-zero status code returned while running OpenVINO-EP-subgraph_5 node. Name:'OpenVINOExecutionProvider_OpenVINO-EP-subgraph_5_1' Status Message: Cannot find blob with name: input_ids:0
Yes.. IMO the openVINO error is not because of tensor data but openvino integration with ONNXRT.
Description The OnnxRt-Openvino backend produces the errors when ran with Triton. The error shows up when running the BERT onnx model from the zoo. However, when the same model is ran from the Jupyter notebook outside of Triton with
OnnxRT-openvino
backend it produces the desired outputs.Triton Information Triton server container v21.10
Are you using the Triton container or did you build it yourself? - using container v21.10
To Reproduce
Download the BERT onnx model from the onnx zoo
The following is the
config.pbtxt
which uses theOpenvino
acceleratorRun the
perf_analyzer
on the Triton hosted model and get the following errorRequest concurrency: 1 Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests. Thread [0] had error: onnx runtime error 6: Non-zero status code returned while running OpenVINO-EP-subgraph_5 node. Name:'OpenVINOExecutionProvider_OpenVINO-EP-subgraph_5_1' Status Message: Cannot find blob with name: input_ids:0