Closed akashAD98 closed 1 year ago
@yury-gorbachev, who could help with this?
@wangleis , can you help?
@yury-gorbachev @raymondlo84 I'm using this notebook, https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/226-yolov7-optimization/226-yolov7-optimization.ipynb
core = Core()
# read converted model
#pat of object detector
model = core.read_model("object_detector/yolov7xint.xml")
# load model on CPU device
model = core.compile_model(model, 'CPU')
& returning bbox,score,labels is there anything i need to do here?
Hi, Within this notebook there are two last cells, i.e. benchmark app execution. Could you please share results of those for respective configurations that you refer? Thanks
@yury-gorbachev here its
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 80.23 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ] images (node: images) : f32 / [...] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ] output (node: output) : f32 / [...] / [1,25200,56]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ] images (node: images) : u8 / [N,C,H,W] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ] output (node: output) : f32 / [...] / [1,25200,56]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 967.58 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ] NETWORK_NAME: torch_jit
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 8
[ INFO ] NUM_STREAMS: 8
[ INFO ] AFFINITY: Affinity.CORE
[ INFO ] INFERENCE_NUM_THREADS: 32
[ INFO ] PERF_COUNT: False
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ] PERFORMANCE_HINT: PerformanceMode.THROUGHPUT
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'images'!. This input will be filled with random values!
[ INFO ] Fill input 'images' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 8 inference requests, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 450.73 ms
[Step 11/11] Dumping statistics report
[ INFO ] Count: 912 iterations
[ INFO ] Duration: 60855.28 ms
[ INFO ] Latency:
[ INFO ] Median: 534.48 ms
[ INFO ] Average: 532.93 ms
[ INFO ] Min: 516.94 ms
[ INFO ] Max: 646.95 ms
[ INFO ] Throughput: 14.99 FPS
So as you can see it is 15 fps throughput on your configuration, even with fp32. In some configurations (especially with large number of cores) you need to run multiple requests in parallel asynchronously to fully utilize all cores on CPU. You can find more details here: https://docs.openvino.ai/latest/openvino_docs_deployment_optimization_guide_dldt_optimization_guide.html
@yury-gorbachev sorry im not understaing this fp32, im using int8 model, then why its using fp32
yes, you are right, this is confusing message in app, I thought it is fp32 model. Regardless, you have 15 fps, which is higher than 6. So, it is application design that needs to be more parallel as it seems. For instance tracking in parallel with detection is usually good practice to keep more parallel load.
@yury-gorbachev thanks ill try AsyncInferQueue
@akashAD98 Do you mean the performance of below command?
# Inference INT8 model (OpenVINO IR)
!benchmark_app -m model/yolov7-tiny_int8.xml -d CPU -api async
If yes, could you share whole logs of benchmark_app and 'lscpu -e' on all different machine you motioned?
@wangleis yes im sharing
https://github.com/OpenVINO-dev-contest/YOLOv7_OpenVINO_cpp-python/tree/main/python i tried this repo for asynchronous mode, fps increased by 2 only ,now getting 8 fps
yes, you are right, this is confusing message in app, I thought it is fp32 model. Regardless, you have 15 fps, which is higher than 6. So, it is application design that needs to be more parallel as it seems. For instance, tracking in parallel with detection is usually good practice to keep more parallel load.
32 core machine
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE
0 0 0 0 0:0:0:0 yes
1 0 0 0 0:0:0:0 yes
2 0 0 1 1:1:1:0 yes
3 0 0 1 1:1:1:0 yes
4 0 0 2 2:2:2:0 yes
5 0 0 2 2:2:2:0 yes
6 0 0 3 3:3:3:0 yes
7 0 0 3 3:3:3:0 yes
8 0 0 4 4:4:4:0 yes
9 0 0 4 4:4:4:0 yes
10 0 0 5 5:5:5:0 yes
11 0 0 5 5:5:5:0 yes
12 0 0 6 6:6:6:0 yes
13 0 0 6 6:6:6:0 yes
14 0 0 7 7:7:7:0 yes
15 0 0 7 7:7:7:0 yes
16 0 0 8 8:8:8:0 yes
17 0 0 8 8:8:8:0 yes
18 0 0 9 9:9:9:0 yes
19 0 0 9 9:9:9:0 yes
20 0 0 10 10:10:10:0 yes
21 0 0 10 10:10:10:0 yes
22 0 0 11 11:11:11:0 yes
23 0 0 11 11:11:11:0 yes
24 0 0 12 12:12:12:0 yes
25 0 0 12 12:12:12:0 yes
26 0 0 13 13:13:13:0 yes
27 0 0 13 13:13:13:0 yes
28 0 0 14 14:14:14:0 yes
29 0 0 14 14:14:14:0 yes
30 0 0 15 15:15:15:0 yes
31 0 0 15 15:15:15:0 yes
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 81.09 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ] images (node: images) : f32 / [...] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ] output (node: output) : f32 / [...] / [1,25200,56]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ] images (node: images) : u8 / [N,C,H,W] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ] output (node: output) : f32 / [...] / [1,25200,56]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 975.39 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ] NETWORK_NAME: torch_jit
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 8
[ INFO ] NUM_STREAMS: 8
[ INFO ] AFFINITY: Affinity.CORE
[ INFO ] INFERENCE_NUM_THREADS: 32
[ INFO ] PERF_COUNT: False
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ] PERFORMANCE_HINT: PerformanceMode.THROUGHPUT
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'images'!. This input will be filled with random values!
[ INFO ] Fill input 'images' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 8 inference requests, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 457.41 ms
[Step 11/11] Dumping statistics report
[ INFO ] Count: 904 iterations
[ INFO ] Duration: 60806.74 ms
[ INFO ] Latency:
[ INFO ] Median: 537.78 ms
[ INFO ] Average: 537.00 ms
[ INFO ] Min: 517.44 ms
[ INFO ] Max: 598.29 ms
[ INFO ] Throughput: 14.87 FPS
@akashAD98 Thanks for your sharing. Could you share the log of same application on another 8 cores machine?
@wangleis yes im sharing, i dont know but after adding config file my fps increased by 2
earlier i was using
model = core.compile_model(model, 'CPU')
then i tried this
model = core.compile_model(model, config={"CPU":"INFERENCE_NUM_THREADS 32"})
green one is with config
earlier im getting 5, now getting 8 fps for 32 core, is ther anything like directly i can add asyncronus to my model?or any other solutions for this
i have done setup of openvino in another machine ,but im getting error command not found for benchmar
16 core
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE
0 0 0 0 0:0:0:0 yes
1 0 0 0 0:0:0:0 yes
2 0 0 1 1:1:1:0 yes
3 0 0 1 1:1:1:0 yes
4 0 0 2 2:2:2:0 yes
5 0 0 2 2:2:2:0 yes
6 0 0 3 3:3:3:0 yes
7 0 0 3 3:3:3:0 yes
8 0 0 4 4:4:4:0 yes
9 0 0 4 4:4:4:0 yes
10 0 0 5 5:5:5:0 yes
11 0 0 5 5:5:5:0 yes
12 0 0 6 6:6:6:0 yes
13 0 0 6 6:6:6:0 yes
14 0 0 7 7:7:7:0 yes
15 0 0 7 7:7:7:0 yes
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 80.84 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ] images (node: images) : f32 / [...] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ] output (node: output) : f32 / [...] / [1,25200,56]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ] images (node: images) : u8 / [N,C,H,W] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ] output (node: output) : f32 / [...] / [1,25200,56]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 923.75 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ] NETWORK_NAME: torch_jit
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4
[ INFO ] NUM_STREAMS: 4
[ INFO ] AFFINITY: Affinity.CORE
[ INFO ] INFERENCE_NUM_THREADS: 16
[ INFO ] PERF_COUNT: False
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ] PERFORMANCE_HINT: PerformanceMode.THROUGHPUT
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'images'!. This input will be filled with random values!
[ INFO ] Fill input 'images' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 452.06 ms
[Step 11/11] Dumping statistics report
[ INFO ] Count: 528 iterations
[ INFO ] Duration: 60834.34 ms
[ INFO ] Latency:
[ INFO ] Median: 459.38 ms
[ INFO ] Average: 460.32 ms
[ INFO ] Min: 453.69 ms
[ INFO ] Max: 514.85 ms
[ INFO ] Throughput: 8.68 FPS
@wangleis please check it, hope this information is helpful to find reasons
@akashAD98 The preferred way to configure performance in OpenVINO Runtime is using performance hints. Please find more detail at https://docs.openvino.ai/latest/openvino_docs_OV_UG_Performance_Hints.html
In your case, please use throughput hint during compile_model() as below. Then provide enough input for optimal number of infer requests to keep all infer requests work in parallel
// it is important to query/create and run the sufficient #requests
auto compiled_model = core.compile_model(model, "CPU",
ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT));
auto num_requests = compiled_model.get_property(ov::optimal_number_of_infer_requests);
@wangleis I'm using python, do you mean this? for 32 core what needs to put
i don't know how to assign this values in python
// it is important to query/create and run the sufficient #requests
auto compiled_model = core.compile_model(model, "CPU",
ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT));
auto num_requests = compiled_model.get_property(ov::optimal_number_of_infer_requests);
@akashAD98, if you go to the link above, you can check how to set them up in Python. Just click on the Python tab.
@akashAD98 Please also refer Inference with THROUGHPUT hint
part in OpenVINO notebook https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/106-auto-device/106-auto-device.ipynb
@wangleis @adrianboguszewski
using this I got fewer fps -3.4 fps on 32 core
config = {"PERFORMANCE_HINT": "THROUGHPUT",
# "PERFORMANCE_HINT_NUM_REQUESTS": "32"}
model = core.compile_model(model, "CPU", config)
using the below I got 8.83 fps
model = core.read_model("object_detector/yolov7_int8new.xml")
# load model on CPU device
#model = core.compile_model(model, 'CPU')
model = core.compile_model(model,config={"CPU":"INFERENCE_NUM_THREADS 32"})
@akashAD98 If you want to do inference on CPU, please use
config = {"PERFORMANCE_HINT": "THROUGHPUT"}
compiled_model = core.compile_model(model, "CPU", config)
When you remove target device in compile_model()
, OpenVINO runtime will identify target device and may load network to GPU.
@wangleis I tried it on CPU & getting very low fps, how should I achieve more fps? im getting only 5-6 fps on 32 core machine
@akashAD98 May I know if you machine is bare metal CORE platform or VM/Docker environment?
@wangleis its one of the cloud providers vm machine , I have tested with different corr machines & im getting lower fps as number of cores are less
@akashAD98 For benchmarking on this platform, please use benchmark_app -m model/yolov7-tiny_int8.xml -d CPU -hint throughput -pin NUMA
as the last command at https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/226-yolov7-optimization/226-yolov7-optimization.ipynb
For your application, please use
setting = {"PERFORMANCE_HINT":"THROUGHPUT", "AFFINITY":"NUMA"}
compiled_model = core.compile_model(model=model, device_name="CPU", config=setting)
@wangleis for benchmark im getting this error
benchmark_app -m yolov7_int8new.xml -d CPU -hint throughput -pin NUMA [Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ ERROR ] -nstreams, -nthreads and -pin options are fine tune options. To use them you should explicitely set -hint option to none. This is not OpenVINO limitation (those options can be used in OpenVINO together), but a benchmark_app UI rule.
Traceback (most recent call last):
File "/home/refx/yolov7_tracker/openvino_env/lib/python3.8/site-packages/openvino/tools/benchmark/main.py", line 59, in main
args, is_network_compiled = parse_and_check_command_line()
File "/home/refx/yolov7_tracker/openvino_env/lib/python3.8/site-packages/openvino/tools/benchmark/main.py", line 35, in parse_and_check_command_line
raise Exception("-nstreams, -nthreads and -pin options are fine tune options. To use them you " \
Exception: -nstreams, -nthreads and -pin options are fine tune options. To use them you should explicitely set -hint option to none. This is not OpenVINO limitation (those options can be used in OpenVINO together), but a benchmark_app UI rule.
& im getting lower fps for this
setting = {"PERFORMANCE_HINT":"THROUGHPUT", "AFFINITY":"NUMA"}
model = core.compile_model(model=model, device_name="CPU", config=setting)
also i tried to set device_name='GPU' & getting error
/yolov7_tracker/openvino_env/lib/python3.8/site-packages/openvino/libs/libopenvino_intel_gpu_plugin.so for device GPU
Please, check your environment
Check 'error_code == 0' failed at src/plugins/intel_gpu/src/runtime/ocl/ocl_device_detector.cpp:194:
[GPU] No supported OCL devices found or unexpected error happened during devices query.
[GPU] Please check OpenVINO documentation for GPU drivers setup guide.
[GPU] clGetPlatformIDs error code: -1001
out of all this,its giving higher fps,but i as you mention its using gpu by defualt, but i dont have gpu,then how its working?
model = core.compile_model(model,config={"CPU":"INFERENCE_NUM_THREADS 32"})
@akashAD98 For benchmarking on this platform, please use benchmark_app -m model/yolov7-tiny_int8.xml -d CPU -hint none -nthreads 32 -pin NUMA
When you call compile_model()
without device name, OpenVINO will use AUTO
device by default. AUTO devices will identify and use devices available in the system, not just the CPU. Please learn more detail from https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_AUTO.html.
If there is no GPU and the inference runs on the CPU, that's fine. Sorry for misleading.
@wangleis thanks for your kind & fast reply.
i tested it on 32 core machine & im getting this much fps, for yolov7int8.xml model
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] No device CPU performance hint is set.
[ WARNING ] -nstreams default value is determined automatically for CPU device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 60.56 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ] images (node: images) : f32 / [...] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ] output (node: output) : f32 / [...] / [1,25200,85]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ] images (node: images) : u8 / [N,C,H,W] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ] output (node: output) : f32 / [...] / [1,25200,85]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 771.48 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ] NETWORK_NAME: torch_jit
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 8
[ INFO ] NUM_STREAMS: 8
[ INFO ] AFFINITY: Affinity.NUMA
[ INFO ] INFERENCE_NUM_THREADS: 32
[ INFO ] PERF_COUNT: False
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ] PERFORMANCE_HINT: PerformanceMode.UNDEFINED
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'images'!. This input will be filled with random values!
[ INFO ] Fill input 'images' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 8 inference requests using 8 streams for CPU, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference-only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 142.49 ms
[Step 11/11] Dumping statistics report
[ INFO ] Count: 1576 iterations
[ INFO ] Duration: 60502.22 ms
[ INFO ] Latency:
[ INFO ] Median: 306.83 ms
[ INFO ] Average: 306.65 ms
[ INFO ] Min: 270.56 ms
[ INFO ] Max: 333.05 ms
[ INFO ] Throughput: 26.05 FPS
how can i achieve more fps,I'm not able to get more than 10 fps.
@akashAD98 Please refer the following suggestion:
compile_model()
setting = {"INFERENCE_NUM_THREADS":"32", "AFFINITY":"NUMA"}
compiled_model = core.compile_model(model=model, device_name="CPU", config=setting)
Closing because of the lack of activity. Please reopen if needed.
im using yolov7xcustom.xml file for object detection & tracking,im getting the same fps speed on different machines, may i know what is the problem? i guess its only using 1/2 core out of all core.
example i tried this machine