openvinotoolkit / openvino_notebooks

📚 Jupyter notebook tutorials for OpenVINO™
Apache License 2.0
2.44k stars 815 forks source link

im getting very low or similar fps on different core, tried 8,16,32 core machines #916

Closed akashAD98 closed 1 year ago

akashAD98 commented 1 year ago

im using yolov7xcustom.xml file for object detection & tracking,im getting the same fps speed on different machines, may i know what is the problem? i guess its only using 1/2 core out of all core.

example i tried this machine


qa-model-test-linux-32-core
Type/ Ram/ Core:
Standard D32s v3 (32 vcpus, 128 GiB memory)```
adrianboguszewski commented 1 year ago

@yury-gorbachev, who could help with this?

akashAD98 commented 1 year ago
image image
yury-gorbachev commented 1 year ago

@wangleis , can you help?

akashAD98 commented 1 year ago

@yury-gorbachev @raymondlo84 I'm using this notebook, https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/226-yolov7-optimization/226-yolov7-optimization.ipynb

core = Core()
# read converted model
#pat of object detector
model = core.read_model("object_detector/yolov7xint.xml")
# load model on CPU device
model = core.compile_model(model, 'CPU')

& returning bbox,score,labels is there anything i need to do here?

yury-gorbachev commented 1 year ago

Hi, Within this notebook there are two last cells, i.e. benchmark app execution. Could you please share results of those for respective configurations that you refer? Thanks

akashAD98 commented 1 year ago

@yury-gorbachev here its

[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 80.23 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     images (node: images) : f32 / [...] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ]     output (node: output) : f32 / [...] / [1,25200,56]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     images (node: images) : u8 / [N,C,H,W] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ]     output (node: output) : f32 / [...] / [1,25200,56]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 967.58 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ]   NETWORK_NAME: torch_jit
[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 8
[ INFO ]   NUM_STREAMS: 8
[ INFO ]   AFFINITY: Affinity.CORE
[ INFO ]   INFERENCE_NUM_THREADS: 32
[ INFO ]   PERF_COUNT: False
[ INFO ]   INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ]   PERFORMANCE_HINT: PerformanceMode.THROUGHPUT
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 0
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'images'!. This input will be filled with random values!
[ INFO ] Fill input 'images' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 8 inference requests, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 450.73 ms
[Step 11/11] Dumping statistics report
[ INFO ] Count:            912 iterations
[ INFO ] Duration:         60855.28 ms
[ INFO ] Latency:
[ INFO ]    Median:        534.48 ms
[ INFO ]    Average:       532.93 ms
[ INFO ]    Min:           516.94 ms
[ INFO ]    Max:           646.95 ms
[ INFO ] Throughput:   14.99 FPS
yury-gorbachev commented 1 year ago

So as you can see it is 15 fps throughput on your configuration, even with fp32. In some configurations (especially with large number of cores) you need to run multiple requests in parallel asynchronously to fully utilize all cores on CPU. You can find more details here: https://docs.openvino.ai/latest/openvino_docs_deployment_optimization_guide_dldt_optimization_guide.html

akashAD98 commented 1 year ago

@yury-gorbachev sorry im not understaing this fp32, im using int8 model, then why its using fp32

yury-gorbachev commented 1 year ago

yes, you are right, this is confusing message in app, I thought it is fp32 model. Regardless, you have 15 fps, which is higher than 6. So, it is application design that needs to be more parallel as it seems. For instance tracking in parallel with detection is usually good practice to keep more parallel load.

akashAD98 commented 1 year ago

@yury-gorbachev thanks ill try AsyncInferQueue

wangleis commented 1 year ago

@akashAD98 Do you mean the performance of below command?

# Inference INT8 model (OpenVINO IR)
!benchmark_app -m model/yolov7-tiny_int8.xml -d CPU -api async

If yes, could you share whole logs of benchmark_app and 'lscpu -e' on all different machine you motioned?

akashAD98 commented 1 year ago

@wangleis yes im sharing

akashAD98 commented 1 year ago

https://github.com/OpenVINO-dev-contest/YOLOv7_OpenVINO_cpp-python/tree/main/python i tried this repo for asynchronous mode, fps increased by 2 only ,now getting 8 fps

yes, you are right, this is confusing message in app, I thought it is fp32 model. Regardless, you have 15 fps, which is higher than 6. So, it is application design that needs to be more parallel as it seems. For instance, tracking in parallel with detection is usually good practice to keep more parallel load.

akashAD98 commented 1 year ago

32 core machine

CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE
  0    0      0    0 0:0:0:0          yes
  1    0      0    0 0:0:0:0          yes
  2    0      0    1 1:1:1:0          yes
  3    0      0    1 1:1:1:0          yes
  4    0      0    2 2:2:2:0          yes
  5    0      0    2 2:2:2:0          yes
  6    0      0    3 3:3:3:0          yes
  7    0      0    3 3:3:3:0          yes
  8    0      0    4 4:4:4:0          yes
  9    0      0    4 4:4:4:0          yes
 10    0      0    5 5:5:5:0          yes
 11    0      0    5 5:5:5:0          yes
 12    0      0    6 6:6:6:0          yes
 13    0      0    6 6:6:6:0          yes
 14    0      0    7 7:7:7:0          yes
 15    0      0    7 7:7:7:0          yes
 16    0      0    8 8:8:8:0          yes
 17    0      0    8 8:8:8:0          yes
 18    0      0    9 9:9:9:0          yes
 19    0      0    9 9:9:9:0          yes
 20    0      0   10 10:10:10:0       yes
 21    0      0   10 10:10:10:0       yes
 22    0      0   11 11:11:11:0       yes
 23    0      0   11 11:11:11:0       yes
 24    0      0   12 12:12:12:0       yes
 25    0      0   12 12:12:12:0       yes
 26    0      0   13 13:13:13:0       yes
 27    0      0   13 13:13:13:0       yes
 28    0      0   14 14:14:14:0       yes
 29    0      0   14 14:14:14:0       yes
 30    0      0   15 15:15:15:0       yes
 31    0      0   15 15:15:15:0       yes
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 81.09 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     images (node: images) : f32 / [...] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ]     output (node: output) : f32 / [...] / [1,25200,56]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     images (node: images) : u8 / [N,C,H,W] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ]     output (node: output) : f32 / [...] / [1,25200,56]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 975.39 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ]   NETWORK_NAME: torch_jit
[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 8
[ INFO ]   NUM_STREAMS: 8
[ INFO ]   AFFINITY: Affinity.CORE
[ INFO ]   INFERENCE_NUM_THREADS: 32
[ INFO ]   PERF_COUNT: False
[ INFO ]   INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ]   PERFORMANCE_HINT: PerformanceMode.THROUGHPUT
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 0
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'images'!. This input will be filled with random values!
[ INFO ] Fill input 'images' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 8 inference requests, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 457.41 ms
[Step 11/11] Dumping statistics report
[ INFO ] Count:            904 iterations
[ INFO ] Duration:         60806.74 ms
[ INFO ] Latency:
[ INFO ]    Median:        537.78 ms
[ INFO ]    Average:       537.00 ms
[ INFO ]    Min:           517.44 ms
[ INFO ]    Max:           598.29 ms
[ INFO ] Throughput:   14.87 FPS
wangleis commented 1 year ago

@akashAD98 Thanks for your sharing. Could you share the log of same application on another 8 cores machine?

akashAD98 commented 1 year ago

@wangleis yes im sharing, i dont know but after adding config file my fps increased by 2

earlier i was using

model = core.compile_model(model, 'CPU')

then i tried this

model = core.compile_model(model, config={"CPU":"INFERENCE_NUM_THREADS 32"})

green one is with config

image

earlier im getting 5, now getting 8 fps for 32 core, is ther anything like directly i can add asyncronus to my model?or any other solutions for this

i have done setup of openvino in another machine ,but im getting error command not found for benchmar

16 core

CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE
  0    0      0    0 0:0:0:0          yes
  1    0      0    0 0:0:0:0          yes
  2    0      0    1 1:1:1:0          yes
  3    0      0    1 1:1:1:0          yes
  4    0      0    2 2:2:2:0          yes
  5    0      0    2 2:2:2:0          yes
  6    0      0    3 3:3:3:0          yes
  7    0      0    3 3:3:3:0          yes
  8    0      0    4 4:4:4:0          yes
  9    0      0    4 4:4:4:0          yes
 10    0      0    5 5:5:5:0          yes
 11    0      0    5 5:5:5:0          yes
 12    0      0    6 6:6:6:0          yes
 13    0      0    6 6:6:6:0          yes
 14    0      0    7 7:7:7:0          yes
 15    0      0    7 7:7:7:0          yes
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 80.84 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     images (node: images) : f32 / [...] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ]     output (node: output) : f32 / [...] / [1,25200,56]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     images (node: images) : u8 / [N,C,H,W] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ]     output (node: output) : f32 / [...] / [1,25200,56]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 923.75 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ]   NETWORK_NAME: torch_jit
[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4
[ INFO ]   NUM_STREAMS: 4
[ INFO ]   AFFINITY: Affinity.CORE
[ INFO ]   INFERENCE_NUM_THREADS: 16
[ INFO ]   PERF_COUNT: False
[ INFO ]   INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ]   PERFORMANCE_HINT: PerformanceMode.THROUGHPUT
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 0
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'images'!. This input will be filled with random values!
[ INFO ] Fill input 'images' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 452.06 ms
[Step 11/11] Dumping statistics report
[ INFO ] Count:            528 iterations
[ INFO ] Duration:         60834.34 ms
[ INFO ] Latency:
[ INFO ]    Median:        459.38 ms
[ INFO ]    Average:       460.32 ms
[ INFO ]    Min:           453.69 ms
[ INFO ]    Max:           514.85 ms
[ INFO ] Throughput:   8.68 FPS
akashAD98 commented 1 year ago

@wangleis please check it, hope this information is helpful to find reasons

wangleis commented 1 year ago

@akashAD98 The preferred way to configure performance in OpenVINO Runtime is using performance hints. Please find more detail at https://docs.openvino.ai/latest/openvino_docs_OV_UG_Performance_Hints.html

In your case, please use throughput hint during compile_model() as below. Then provide enough input for optimal number of infer requests to keep all infer requests work in parallel

// it is important to query/create and run the sufficient #requests
auto compiled_model = core.compile_model(model, "CPU",
    ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT));
auto num_requests = compiled_model.get_property(ov::optimal_number_of_infer_requests);
akashAD98 commented 1 year ago

@wangleis I'm using python, do you mean this? for 32 core what needs to put

i don't know how to assign this values in python

// it is important to query/create and run the sufficient #requests
auto compiled_model = core.compile_model(model, "CPU",
    ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT));
auto num_requests = compiled_model.get_property(ov::optimal_number_of_infer_requests);
image
adrianboguszewski commented 1 year ago

@akashAD98, if you go to the link above, you can check how to set them up in Python. Just click on the Python tab. image

wangleis commented 1 year ago

@akashAD98 Please also refer Inference with THROUGHPUT hint part in OpenVINO notebook https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/106-auto-device/106-auto-device.ipynb

akashAD98 commented 1 year ago

@wangleis @adrianboguszewski

using this I got fewer fps -3.4 fps on 32 core

config = {"PERFORMANCE_HINT": "THROUGHPUT",
         # "PERFORMANCE_HINT_NUM_REQUESTS": "32"}
model = core.compile_model(model, "CPU", config)

using the below I got 8.83 fps

model = core.read_model("object_detector/yolov7_int8new.xml")
# load model on CPU device
#model = core.compile_model(model, 'CPU')
model = core.compile_model(model,config={"CPU":"INFERENCE_NUM_THREADS 32"}) 
wangleis commented 1 year ago

@akashAD98 If you want to do inference on CPU, please use

config = {"PERFORMANCE_HINT": "THROUGHPUT"}
compiled_model = core.compile_model(model, "CPU", config)

When you remove target device in compile_model(), OpenVINO runtime will identify target device and may load network to GPU.

akashAD98 commented 1 year ago

@wangleis I tried it on CPU & getting very low fps, how should I achieve more fps? im getting only 5-6 fps on 32 core machine

wangleis commented 1 year ago

@akashAD98 May I know if you machine is bare metal CORE platform or VM/Docker environment?

akashAD98 commented 1 year ago

@wangleis its one of the cloud providers vm machine , I have tested with different corr machines & im getting lower fps as number of cores are less

wangleis commented 1 year ago

@akashAD98 For benchmarking on this platform, please use benchmark_app -m model/yolov7-tiny_int8.xml -d CPU -hint throughput -pin NUMA as the last command at https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/226-yolov7-optimization/226-yolov7-optimization.ipynb

For your application, please use

setting = {"PERFORMANCE_HINT":"THROUGHPUT", "AFFINITY":"NUMA"}
compiled_model = core.compile_model(model=model, device_name="CPU", config=setting)
akashAD98 commented 1 year ago

@wangleis for benchmark im getting this error

benchmark_app  -m yolov7_int8new.xml -d CPU -hint throughput -pin NUMA                               [Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ ERROR ] -nstreams, -nthreads and -pin options are fine tune options. To use them you should explicitely set -hint option to none. This is not OpenVINO limitation (those options can be used in OpenVINO together), but a benchmark_app UI rule.
Traceback (most recent call last):
  File "/home/refx/yolov7_tracker/openvino_env/lib/python3.8/site-packages/openvino/tools/benchmark/main.py", line 59, in main
    args, is_network_compiled = parse_and_check_command_line()
  File "/home/refx/yolov7_tracker/openvino_env/lib/python3.8/site-packages/openvino/tools/benchmark/main.py", line 35, in parse_and_check_command_line
    raise Exception("-nstreams, -nthreads and -pin options are fine tune options. To use them you " \
Exception: -nstreams, -nthreads and -pin options are fine tune options. To use them you should explicitely set -hint option to none. This is not OpenVINO limitation (those options can be used in OpenVINO together), but a benchmark_app UI rule.

& im getting lower fps for this

setting = {"PERFORMANCE_HINT":"THROUGHPUT", "AFFINITY":"NUMA"}
model = core.compile_model(model=model, device_name="CPU", config=setting) 
image

also i tried to set device_name='GPU' & getting error

/yolov7_tracker/openvino_env/lib/python3.8/site-packages/openvino/libs/libopenvino_intel_gpu_plugin.so for device GPU
Please, check your environment
Check 'error_code == 0' failed at src/plugins/intel_gpu/src/runtime/ocl/ocl_device_detector.cpp:194:
[GPU] No supported OCL devices found or unexpected error happened during devices query.
[GPU] Please check OpenVINO documentation for GPU drivers setup guide.
[GPU] clGetPlatformIDs error code: -1001

out of all this,its giving higher fps,but i as you mention its using gpu by defualt, but i dont have gpu,then how its working?

model = core.compile_model(model,config={"CPU":"INFERENCE_NUM_THREADS 32"}) 
wangleis commented 1 year ago

@akashAD98 For benchmarking on this platform, please use benchmark_app -m model/yolov7-tiny_int8.xml -d CPU -hint none -nthreads 32 -pin NUMA

When you call compile_model() without device name, OpenVINO will use AUTO device by default. AUTO devices will identify and use devices available in the system, not just the CPU. Please learn more detail from https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_AUTO.html.

If there is no GPU and the inference runs on the CPU, that's fine. Sorry for misleading.

akashAD98 commented 1 year ago

@wangleis thanks for your kind & fast reply.

i tested it on 32 core machine & im getting this much fps, for yolov7int8.xml model

[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] No device CPU performance hint is set.
[ WARNING ] -nstreams default value is determined automatically for CPU device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 60.56 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     images (node: images) : f32 / [...] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ]     output (node: output) : f32 / [...] / [1,25200,85]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     images (node: images) : u8 / [N,C,H,W] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ]     output (node: output) : f32 / [...] / [1,25200,85]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 771.48 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ]   NETWORK_NAME: torch_jit
[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 8
[ INFO ]   NUM_STREAMS: 8
[ INFO ]   AFFINITY: Affinity.NUMA
[ INFO ]   INFERENCE_NUM_THREADS: 32
[ INFO ]   PERF_COUNT: False
[ INFO ]   INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ]   PERFORMANCE_HINT: PerformanceMode.UNDEFINED
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 0
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'images'!. This input will be filled with random values!
[ INFO ] Fill input 'images' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 8 inference requests using 8 streams for CPU, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference-only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 142.49 ms
[Step 11/11] Dumping statistics report
[ INFO ] Count:            1576 iterations
[ INFO ] Duration:         60502.22 ms
[ INFO ] Latency:
[ INFO ]    Median:        306.83 ms
[ INFO ]    Average:       306.65 ms
[ INFO ]    Min:           270.56 ms
[ INFO ]    Max:           333.05 ms
[ INFO ] Throughput:   26.05 FPS

how can i achieve more fps,I'm not able to get more than 10 fps.

wangleis commented 1 year ago

@akashAD98 Please refer the following suggestion:

  1. Please refer https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/115-async-api/115-async-api.ipynb to build application based on AsyncInferQueue for asynchronous inference pipelines
  2. Use below setting for compile_model()
    setting = {"INFERENCE_NUM_THREADS":"32", "AFFINITY":"NUMA"}
    compiled_model = core.compile_model(model=model, device_name="CPU", config=setting)
  3. Separate input data generation, pre-processing and post-processing to different loop. Keep the inference loop running at full speed.
adrianboguszewski commented 1 year ago

Closing because of the lack of activity. Please reopen if needed.