300% slower on MYRIAD_FP16 when using CustomVision fp16 model

Describe the issue

In trying to accelerate an exported Custom Vision model, it seems to be over 300 % slower (2442 MS vs. 703 MS) when accelerated on a MyriadX VPU.

In the default/sample code from Custom Vision I have updated the InferenceSession line to:

self.session = onnxruntime.InferenceSession(temp, providers=['OpenVINOExecutionProvider'], provider_options=[{'device_type': 'MYRIAD_FP16'}])

lsusb | grep Myriad Bus 003 Device 012: ID 03e7:2485 Intel Movidius MyriadX

To reproduce

On a device with a MyriadX, install onnxrumtime-openvino
Download and unzip python.zip, which is a sample compact domain not S1.
Run the command: python3 ./onnxruntime_predict.py sample.jpg and note the output and performance around 2442 MS (shown below):

python3 ./onnxruntime_predict.py sample.jpg 2023-01-04 15:52:55.919507021 [I:onnxruntime:, inference_session.cc:263 operator()] Flush-to-zero and denormal-as-zero are off 2023-01-04 15:52:55.919580746 [I:onnxruntime:, inference_session.cc:271 ConstructorCommon] Creating and using per session threadpools since use_per_sessionthreads is true 2023-01-04 15:52:55.919608785 [I:onnxruntime:, inference_session.cc:292 ConstructorCommon] Dynamic block base set to 0 2023-01-04 15:52:56.007347026 [I:onnxruntime:, inference_session.cc:1222 Initialize] Initializing session. 2023-01-04 15:52:56.007413914 [I:onnxruntime:, inference_session.cc:1259 Initialize] Adding default CPU execution provider. 2023-01-04 15:52:56.007488562 [I:onnxruntime:, session_state.cc:31 SetupAllocators] Allocator already registered for OrtMemoryInfo:[name:Cpu id:0 OrtMemType:0 OrtAllocatorType:1 Device:[DeviceType:0 MemoryType:0 DeviceId:0]]. Ignoring allocator from CPUExecutionProvider 2023-01-04 15:52:56.012267211 [I:onnxruntime:, reshape_fusion.cc:42 ApplyImpl] Total fused reshape node count: 0 2023-01-04 15:52:56.013809384 [V:onnxruntime:, session_state.cc:1010 VerifyEachNodeIsAssignedToAnEp] Node placements 2023-01-04 15:52:56.013874785 [V:onnxruntime:, session_state.cc:1013 VerifyEachNodeIsAssignedToAnEp] All nodes placed on [OpenVINOExecutionProvider]. Number of nodes: 1 2023-01-04 15:52:56.013941934 [V:onnxruntime:, session_state.cc:66 CreateGraphInfo] SaveMLValueNameIndexMapping 2023-01-04 15:52:56.013984973 [V:onnxruntime:, session_state.cc:112 CreateGraphInfo] Done saving OrtValue mappings. 2023-01-04 15:52:56.014120618 [I:onnxruntime:, session_state_utils.cc:199 SaveInitializedTensors] Saving initialized tensors. 2023-01-04 15:52:56.014224907 [I:onnxruntime:, session_state_utils.cc:286 SaveInitializedTensors] [Memory] SessionStateInitializer statically allocates 22075136 bytes for OpenVINO_CPU

2023-01-04 15:52:56.034266935 [I:onnxruntime:, session_state_utils.cc:342 SaveInitializedTensors] Done saving initialized tensors 2023-01-04 15:52:56.034401557 [I:onnxruntime:, inference_session.cc:1488 Initialize] Session successfully initialized. 2023-01-04 15:52:56.161128992 [I:onnxruntime:, sequential_executor.cc:176 Execute] Begin execution 2023-01-04 15:52:58.496444241 [W:onnxruntime:, execution_frame.cc:828 VerifyOutputSizes] Expected shape from model of {-1,50,13,13} does not match actual shape of {1,50,12,22} for output model_outputs0 2442.52 MS [{'probability': 0.69025356, 'tagId': 2, 'tagName': 'MailTruck', 'boundingBox': {'left': 0.24598908, 'top': 0.50931931, 'width': 0.07350751, 'height': 0.11108013}}, {'probability': 0.11874406, 'tagId': 3, 'tagName': 'Other', 'boundingBox': {'left': 0.54359399, 'top': 0.60347093, 'width': 0.13043485, 'height': 0.17345715}}]

Change line 30 of onnxrumtime_predicy.py to remove the provider_options and rerun the command python3 ./onnxruntime_predict.py sample.jpg and note the 703 MS shown below:

python3 ./onnxruntime_predict.py sample.jpg 2023-01-04 15:55:27.727558718 [I:onnxruntime:, inference_session.cc:263 operator()] Flush-to-zero and denormal-as-zero are off 2023-01-04 15:55:27.727633811 [I:onnxruntime:, inference_session.cc:271 ConstructorCommon] Creating and using per session threadpools since use_per_sessionthreads is true 2023-01-04 15:55:27.727662164 [I:onnxruntime:, inference_session.cc:292 ConstructorCommon] Dynamic block base set to 0 2023-01-04 15:55:27.815505720 [I:onnxruntime:, inference_session.cc:1222 Initialize] Initializing session. 2023-01-04 15:55:27.815576300 [I:onnxruntime:, inference_session.cc:1259 Initialize] Adding default CPU execution provider. 2023-01-04 15:55:27.815620865 [I:onnxruntime:, session_state.cc:31 SetupAllocators] Allocator already registered for OrtMemoryInfo:[name:Cpu id:0 OrtMemType:0 OrtAllocatorType:1 Device:[DeviceType:0 MemoryType:0 DeviceId:0]]. Ignoring allocator from CPUExecutionProvider 2023-01-04 15:55:27.820222957 [I:onnxruntime:, reshape_fusion.cc:42 ApplyImpl] Total fused reshape node count: 0 2023-01-04 15:55:27.821672392 [V:onnxruntime:, session_state.cc:1010 VerifyEachNodeIsAssignedToAnEp] Node placements 2023-01-04 15:55:27.821724433 [V:onnxruntime:, session_state.cc:1013 VerifyEachNodeIsAssignedToAnEp] All nodes placed on [OpenVINOExecutionProvider]. Number of nodes: 1 2023-01-04 15:55:27.821758881 [V:onnxruntime:, session_state.cc:66 CreateGraphInfo] SaveMLValueNameIndexMapping 2023-01-04 15:55:27.821789065 [V:onnxruntime:, session_state.cc:112 CreateGraphInfo] Done saving OrtValue mappings. 2023-01-04 15:55:27.821883530 [I:onnxruntime:, session_state_utils.cc:199 SaveInitializedTensors] Saving initialized tensors. 2023-01-04 15:55:27.821966558 [I:onnxruntime:, session_state_utils.cc:286 SaveInitializedTensors] [Memory] SessionStateInitializer statically allocates 22075136 bytes for OpenVINO_CPU

2023-01-04 15:55:27.841562759 [I:onnxruntime:, session_state_utils.cc:342 SaveInitializedTensors] Done saving initialized tensors 2023-01-04 15:55:27.841696043 [I:onnxruntime:, inference_session.cc:1488 Initialize] Session successfully initialized. 2023-01-04 15:55:27.968215795 [I:onnxruntime:, sequential_executor.cc:176 Execute] Begin execution 2023-01-04 15:55:28.563970520 [W:onnxruntime:, execution_frame.cc:828 VerifyOutputSizes] Expected shape from model of {-1,50,13,13} does not match actual shape of {1,50,12,22} for output model_outputs0 703.03 MS [{'probability': 0.69322181, 'tagId': 2, 'tagName': 'MailTruck', 'boundingBox': {'left': 0.2459873, 'top': 0.50959683, 'width': 0.07348059, 'height': 0.11094462}}, {'probability': 0.1193537, 'tagId': 3, 'tagName': 'Other', 'boundingBox': {'left': 0.54340264, 'top': 0.60339437, 'width': 0.13081755, 'height': 0.17390769}}]

Urgency

low urgency, development machine

Platform

Linux

OS Version

Ubuntu 20.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.13.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

OpenVINO

Execution Provider Library Version

1.13.1

microsoft / onnxruntime

300% slower on MYRIAD_FP16 when using CustomVision fp16 model #14125