openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
6.12k stars 2.04k forks source link

[Bug]: Run OpenVINO benchmark_app with Yolo-v4-tf and yolo-v8 INT8 model Failed on NPU #24560

Open joey5678 opened 2 weeks ago

joey5678 commented 2 weeks ago

OpenVINO Version

2024.1.0-15008-f4afc983258-releases/2024/1

Operating System

Other (Please specify in description)

Device used for inference

NPU

Framework

TensorFlow 1

Model used

yolo-v4-tf

Issue description

When running command:

./benchmark_app -d NPU -m /home/aibox/share/models/public/yolo-v4-tf/FP16/yolo-v4-tf.xml -t 20

Got error: [ ERROR ] Exception from src/plugins/intel_npu/src/backend/include/zero_utils.hpp:21: L0 zeFenceHostSynchronize result: ZE_RESULT_ERROR_DEVICE_LOST, code 0x70000001 - device hung, reset, was removed, or driver update occurred

Same error if with yolo-v4-tf INT8 model

Step-by-step reproduction

Setup Ubuntu22.04.3 OS system on MTL platform (Ultra 7 165HL)

Install GPU driver and NPU v1.2 driver Install OpenVINO 2024.1 Run install_dependency script Build CPP benchmark_app

apt list --installed | grep -E 'intel|zero|opencl'

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

intel-driver-compiler-npu/now 1.2.0.20240404-8553879914 amd64 [installed,local]
intel-fw-npu/now 1.2.0.20240404-8553879914 amd64 [installed,local]
intel-gpu-tools/jammy,now 1.26-2 amd64 [installed]
intel-igc-cm/unknown,now 1.0.224-821~22.04 amd64 [installed]
intel-igc-core/now 1.0.16510.2 amd64 [installed,local]
intel-igc-opencl/now 1.0.16510.2 amd64 [installed,local]
intel-level-zero-gpu/now 1.3.29138.7 amd64 [installed,local]
intel-level-zero-npu/now 1.2.0.20240404-8553879914 amd64 [installed,local]
intel-media-va-driver-non-free/unknown,now 23.4.3-804~22.04 amd64 [installed]
intel-opencl-icd/now 24.13.29138.7 amd64 [installed,local]
level-zero-dev/unknown,now 1.16.15-821~22.04 amd64 [installed]
level-zero/unknown,now 1.16.15-821~22.04 amd64 [installed]
libdrm-intel1/unknown,now 2.4.119-2101~22.04 amd64 [installed,automatic]
ocl-icd-libopencl1/jammy,now 2.2.14-3 amd64 [installed]

Relevant log output

./benchmark_app -d NPU -m /home/aibox/share/models/public/yolo-v4-tf/FP16/yolo-v4-tf.xml -t 20
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.1.0-15008-f4afc983258-releases/2024/1
[ INFO ] 
[ INFO ] Device info:
[ INFO ] NPU
[ INFO ] Build ................................. 2024.1.0-15008-f4afc983258-releases/2024/1
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(NPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 7.74 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Network inputs:
[ INFO ]     image_input (node: image_input) : f32 / [N,H,W,C] / [1,608,608,3]
[ INFO ] Network outputs:
[ INFO ]     conv2d_101 (node: model/conv2d_101/BiasAdd) : f32 / [...] / [1,38,38,255]
[ INFO ]     conv2d_109 (node: model/conv2d_109/BiasAdd) : f32 / [...] / [1,19,19,255]
[ INFO ]     conv2d_93 (node: model/conv2d_93/BiasAdd) : f32 / [...] / [1,76,76,255]
[Step 5/11] Resizing model to match image sizes and given batch
[Step 6/11] Configuring input of the model
[ INFO ] Model batch size: 1
[ INFO ] Network inputs:
[ INFO ]     image_input (node: image_input) : u8 / [N,H,W,C] / [1,608,608,3]
[ INFO ] Network outputs:
[ INFO ]     conv2d_101 (node: model/conv2d_101/BiasAdd) : f32 / [...] / [1,38,38,255]
[ INFO ]     conv2d_109 (node: model/conv2d_109/BiasAdd) : f32 / [...] / [1,19,19,255]
[ INFO ]     conv2d_93 (node: model/conv2d_93/BiasAdd) : f32 / [...] / [1,76,76,255]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 4886.74 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ]   DEVICE_ID: 
[ INFO ]   ENABLE_CPU_PINNING: NO
[ INFO ]   EXECUTION_DEVICES: NPU.3720
[ INFO ]   INFERENCE_PRECISION_HINT: f16
[ INFO ]   INTERNAL_SUPPORTED_PROPERTIES: CACHING_PROPERTIES
[ INFO ]   LOADED_FROM_CACHE: NO
[ INFO ]   NETWORK_NAME: 
[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4
[ INFO ]   PERFORMANCE_HINT: THROUGHPUT
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 1
[ INFO ]   PERF_COUNT: NO
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Test Config 0
[ INFO ] image_input  ([N,H,W,C], u8, [1,608,608,3], static):   random (image/numpy array is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 20000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ ERROR ] Exception from src/plugins/intel_npu/src/backend/include/zero_utils.hpp:21:
L0 zeFenceHostSynchronize result: ZE_RESULT_ERROR_DEVICE_LOST, code 0x70000001 - device hung, reset, was removed, or driver update occurred

Issue submission checklist

rkazants commented 2 weeks ago

@mlyashko, @rzubarev, please help to resolve it.

joey5678 commented 1 week ago

Any update ?

BTW, I got another error when running benchmark_app with INT8 yolov8 model under the same test environment: error: MultiClusterStrategyAssignment Pass failed : Cannot get per cluster memory shapes. Unsupported distribution: #VPU.DistributedTensor<mode = <SEGMENTED>, num_tiles = [1, 1, 2, 1], num_clusters = 2 : i64, alignment = [1, 1, 4, 1]>

completed output log:

./benchmark_app -d NPU -m ~/share/models/public/yolo-v8n/INT8/yolo-v8n.xml -t 15
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.1.0-15008-f4afc983258-releases/2024/1
[ INFO ] 
[ INFO ] Device info:
[ INFO ] NPU
[ INFO ] Build ................................. 2024.1.0-15008-f4afc983258-releases/2024/1
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(NPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 11.67 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Network inputs:
[ INFO ]     images (node: images) : f32 / [...] / [1,3,640,640]
[ INFO ] Network outputs:
[ INFO ]     output0 (node: output0) : f32 / [...] / [1,84,8400]
[ INFO ]     onnx::Reshape_421 (node: onnx::Reshape_421) : f32 / [...] / [1,144,80,80]
[ INFO ]     onnx::Reshape_436 (node: onnx::Reshape_436) : f32 / [...] / [1,144,40,40]
[ INFO ]     onnx::Reshape_451 (node: onnx::Reshape_451) : f32 / [...] / [1,144,20,20]
[Step 5/11] Resizing model to match image sizes and given batch
[ WARNING ] images: layout is not set explicitly, so it is defaulted to NCHW. It is STRONGLY recommended to set layout manually to avoid further issues.
[Step 6/11] Configuring input of the model
[ INFO ] Model batch size: 1
[ INFO ] Network inputs:
[ INFO ]     images (node: images) : u8 / [N,C,H,W] / [1,3,640,640]
[ INFO ] Network outputs:
[ INFO ]     output0 (node: output0) : f32 / [...] / [1,84,8400]
[ INFO ]     onnx::Reshape_421 (node: onnx::Reshape_421) : f32 / [...] / [1,144,80,80]
[ INFO ]     onnx::Reshape_436 (node: onnx::Reshape_436) : f32 / [...] / [1,144,40,40]
[ INFO ]     onnx::Reshape_451 (node: onnx::Reshape_451) : f32 / [...] / [1,144,20,20]
[Step 7/11] Loading the model to the device
error: MultiClusterStrategyAssignment Pass failed : Cannot get per cluster memory shapes. Unsupported distribution: #VPU.DistributedTensor<mode = <SEGMENTED>, num_tiles = [1, 1, 2, 1], num_clusters = 2 : i64, alignment = [1, 1, 4, 1]>
[ ERROR ] Exception from src/inference/src/cpp/core.cpp:106:
Exception from src/inference/src/dev/plugin.cpp:54:
Exception from src/plugins/intel_npu/src/plugin/src/plugin.cpp:513:
Check 'result == ZE_RESULT_SUCCESS' failed at src/plugins/intel_npu/src/compiler/src/zero_compiler_in_driver.cpp:745:
Failed to compile network. L0 createGraph result: ZE_RESULT_ERROR_UNKNOWN, code 0x7ffffffe. Compilation failed
Failed to create executable

If with FP16 yolov8 model, it's OK.

attached the INT8 yolov8 model I use. my-yolo-v8n-int8.zip