[Performance]: How to assign model inference to specific CPUs?

LinGeLin commented 6 hours ago

OpenVINO Version

2024.4.0

Operating System

Ubuntu 20.04 (LTS)

Device used for inference

CPU

OpenVINO installation

Build from source

Programming Language

C++

Hardware Architecture

x86 (64 bits)

Model used

ps model

Model quantization

No

Target Platform

No response

Performance issue description

I am developing a gRPC project using C++ and integrating OpenVINO (ov) into it. The project involves multiple thread pools for preprocessing. I have observed that the inference performance is significantly lower than the data measured by benchmark_app. I suspect that this is due to thread competition between ov and the preprocessing threads in the project. I conducted the following tests:

When infer_thread=24, the utilization of all 24 CPUs fluctuates around 50%.
When infer_thread=16, the utilization of the first 16 CPUs is around 80%, while the utilization of the last 8 CPUs is 0%.

Since my project runs with two models loaded simultaneously, I want to dedicate CPUs 0-11 to Model A, CPUs 12-19 to Model B, and CPUs 20-23 for other operations in the project. However, I haven't found an interface in ov to bind CPUs when loading models. Are there any other suggestions? Thank you.

Step-by-step reproduction

No response

Issue submission checklist

[X] I'm reporting a performance issue. It's not a question.
[X] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
[X] There is reproducer code and related data files such as images, videos, models, etc.

wangleis commented 5 hours ago

hi @LinGeLin Do you run two models in one application process?

LinGeLin commented 5 hours ago

hi @LinGeLin Do you run two models in one application process?

yes

openvinotoolkit / openvino