Open prashant-saxena opened 1 week ago
Hi prashant-saxena, Depending on the model used, device-specific optimizations and network compilations can cause the compile step to be time-consuming, especially with larger models.
OpenVINO™ can cache the model once it is compiled on supported devices and reuse it in later compile_model
calls by simply setting a cache folder beforehand:
import time
from pathlib import Path
# Create cache folder
cache_folder = Path("cache")
cache_folder.mkdir(exist_ok=True)
start = time.time()
core = ov.Core()
# Set cache folder
core.set_property({'CACHE_DIR': cache_folder})
# Compile the model as before
model = core.read_model(model=model_path)
compiled_model = core.compile_model(model, device)
print(f"Cache enabled (first time) - compile time: {time.time() - start}s")
For more information, please refer to Model Caching Overview in OpenVINO™ 2024.3 Documentation.
Hi Wan-Intel,
This post is not about compilation steps or timing but inference time. The inference is taking 4.196 secs on CPU & 22.595 secs on iGPU for the same data (512x512 image). The iGPU is 5 times slower than CPU. Why?
Could you please share the following information with us for further investigation?
Download the codeformer onnx model from here
Convert to IR using
import openvino as ov
model = ov.convert_model("models/codeformer.onnx", output=["y"])
ov.save_model(model, "models/codeformer.xml", compress_to_fp16=True)
Test image Test script
#!/usr/bin/env python
# coding: utf-8
# python imports
from time import perf_counter
# pip imports
import numpy as np
import openvino as ov
import openvino.properties.hint as hints
from PIL import Image
# Initialize the OpenVINO runtime core
core = ov.Core()
# in case of Performance
device_property = {
"GPU": {
hints.execution_mode: hints.ExecutionMode.PERFORMANCE,
hints.performance_mode : hints.PerformanceMode.LATENCY,
hints.inference_precision: ov.Type.f16,
},
"CPU": {
hints.execution_mode: hints.ExecutionMode.PERFORMANCE,
hints.performance_mode : hints.PerformanceMode.LATENCY,
hints.inference_precision: ov.Type.f32,
}
}
core.set_property("HETERO", {"MULTI_DEVICE_PRIORITIES": "GPU,CPU"})
core.set_property("GPU", device_property["GPU"])
core.set_property("CPU", device_property["CPU"])
# Load input image using PIL. Make sure it's 512x512
img = Image.open('cropped.png')
original_size = img.size
img = np.asarray(img)
# Load the network from the IR model files
model = core.read_model(model="models/ir_model/codeformer.xml")
# Compile the model for the CPU/GPU
compiled_model = core.compile_model(model=model, device_name="CPU")
# Create an inference request
infer_request = compiled_model.create_infer_request()
# Preprocess
img = img.astype(np.float32) / 255.0
img = (img - 0.5) / 0.5
img = np.expand_dims(img, axis=0)
img = img.transpose(0, 3, 1, 2)
w = np.float64(1.0)
# Prepare input dictionary
input_dict = {'x': img, 'w': w,}
# Perform inference
t1_start = perf_counter()
infer_request.infer(inputs=input_dict)
print(f'Inference Time : {perf_counter()-t1_start:.3f} secs.')
# Get the output
output = infer_request.get_output_tensor(0).data
# Post-process
output_img = output[0].transpose(1, 2, 0)
output_img = (output_img * 0.5) + 0.5
output_img = (output_img * 255).astype(np.uint8)
# Save using PIL
im = Image.fromarray(output_img)
im.save("output.png")
im.show()
Change device from CPU to GPU & see the time difference
compiled_model = core.compile_model(model=model, device_name="GPU")
I've inferred the model with the CPU and the GPU plugin and encountered the same issue. Inference time on CPU is 7.533s and inference time on iGPU is 43.617s.
Let me check with the relevant team and we will update you as soon as possible.
OpenVINO Version
openvino : 2024.3.0
Operating System
Windows System
Device used for inference
iGPU
OpenVINO installation
PyPi
Programming Language
Python
Hardware Architecture
x86 (64 bits)
Model used
codeformer
Model quantization
Yes
Target Platform
Available devices: CPU IMMUTABLE PROPERTIES: AVAILABLE_DEVICES : "" RANGE_FOR_ASYNC_INFER_REQUESTS : 1 1 1 RANGE_FOR_STREAMS : 1 8 EXECUTION_DEVICES : CPU FULL_DEVICE_NAME : Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz OPTIMIZATION_CAPABILITIES : FP32 FP16 INT8 BIN EXPORT_IMPORT DEVICE_TYPE : integrated DEVICE_ARCHITECTURE : intel64 MUTABLE PROPERTIES: NUM_STREAMS : 1 AFFINITY : NONE INFERENCE_NUM_THREADS : 0 PERF_COUNT : NO INFERENCE_PRECISION_HINT : f32 PERFORMANCE_HINT : LATENCY EXECUTION_MODE_HINT : PERFORMANCE PERFORMANCE_HINT_NUM_REQUESTS : 0 ENABLE_CPU_PINNING : YES SCHEDULING_CORE_TYPE : ANY_CORE MODEL_DISTRIBUTION_POLICY : "" ENABLE_HYPER_THREADING : YES DEVICE_ID : "" CPU_DENORMALS_OPTIMIZATION : NO LOG_LEVEL : LOG_NONE CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE : 1 DYNAMIC_QUANTIZATION_GROUP_SIZE : 0 KV_CACHE_PRECISION : f16
GPU IMMUTABLE PROPERTIES: AVAILABLE_DEVICES : 0 RANGE_FOR_ASYNC_INFER_REQUESTS : 1 2 1 RANGE_FOR_STREAMS : 1 2 OPTIMAL_BATCH_SIZE : 1 MAX_BATCH_SIZE : 1 DEVICE_ARCHITECTURE : GPU: vendor=0x8086 arch=v9.0.0 FULL_DEVICE_NAME : Intel(R) UHD Graphics 620 (iGPU) DEVICE_UUID : 00000000000000000000000000000000 DEVICE_LUID : 0000000000000000 DEVICE_TYPE : integrated DEVICE_GOPS : {f16:844.8,f32:422.4,i8:422.4,u8:422.4} OPTIMIZATION_CAPABILITIES : FP32 BIN FP16 EXPORT_IMPORT GPU_DEVICE_TOTAL_MEM_SIZE : 3379195904 GPU_UARCH_VERSION : 9.0.0 GPU_EXECUTION_UNITS_COUNT : 24 GPU_MEMORY_STATISTICS : "" MUTABLE PROPERTIES: PERF_COUNT : NO MODEL_PRIORITY : MEDIUM GPU_HOST_TASK_PRIORITY : MEDIUM GPU_QUEUE_PRIORITY : MEDIUM GPU_QUEUE_THROTTLE : MEDIUM GPU_ENABLE_LOOP_UNROLLING : YES GPU_DISABLE_WINOGRAD_CONVOLUTION : NO CACHE_DIR : "" CACHE_MODE : optimize_speed PERFORMANCE_HINT : LATENCY EXECUTION_MODE_HINT : PERFORMANCE COMPILATION_NUM_THREADS : 8 NUM_STREAMS : 1 PERFORMANCE_HINT_NUM_REQUESTS : 0 INFERENCE_PRECISION_HINT : f16 ENABLE_CPU_PINNING : NO DEVICE_ID : 0
Performance issue description
Step-by-step reproduction
IR Model is based on CodeFormer and compressed to fp16. This issue is only related to configuration settings, irrespective of the model being used or inference.
When running the inference using
The time taken is 4.196 secs. and with
time taken is 22.595 secs.
Is it because of wrong configuration or integrated GPU's limitations? Any other suggestions to improve the performance? Does API automatically set the best settings based on the device type or one has to set things manually to get the optimal performance?
Cheers
Issue submission checklist