openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
6.74k stars 2.16k forks source link

When the three models are inferenced in parallel, the inference time does not decrease #8551

Closed shixinlishixinli closed 2 years ago

shixinlishixinli commented 2 years ago

i am using openvino to inference formula and handwritten at one time in parallel by import threading. but the inference time does not decrease. when all the model are inferenced on CPU. Multithreading takes the same time as serial processing.
when handwritten model is inferenced on GPU , formula model is inferenced on CPU. multithreading takes more time than serial processing.

Can i choose how many CPUs does one model use? how to accelerate inference time by multithreading?

this is multithreading program 619 threads = [] 620 t1 = threading.Thread(target=formula.latex_recognizer, args=(latex_crop_list,latex_encode_xml,latex_decode_xml,urlinfo,vocab,latex_result)) 621 threads.append(t1) 622 t2 = threading.Thread(target=chinese_handwritten.handwritten_recognizer, args=(handwritten_crop_list, handwritten_xml, handwritten_label, urlinfo, handwritten_result)) 623 threads.append(t2) 624 t3 = threading.Thread(target=ppocr.text_recognizer, args=(ppocr_crop_list, rec_xml, urlinfo, ppocr_result)) 625 threads.append(t3) 626 627 for t in threads: 628 t.setDaemon(True) 629 t.start() 630 for t in threads: 631 t.join()

best lisa SHi

Iffa-Intel commented 2 years ago

Hi,

From OpenVINO perspective, the OpenVINO CPU Plugin does have CPU-specific settings:

  1. The KEY_CPU_THREADS_NUM which the default value is 0 can be used to specifies the number of threads that CPU plugin should use for inference. Zero (default) means using all (logical) cores
  2. KEY_CPU_BIND_THREAD
  3. KEY_CPU_THROUGHPUT_STREAMS

You may refer to this OpenVINO CPU Plugin Supported Configuration Parameters for a detailed explanation of each parameter.

In term of performance, (since you mention about reducing inferencing time) generally, performance means how fast the model is in deployment which two key metrics are used to measure performance: latency and throughput. There are also other factors that influence this which you can view here for more details.

shixinlishixinli commented 2 years ago

thank you for your information .

i try to set CPU-specific settings, but they are not useful for me.

i try to infer three models in parallel. one is inferenced by GPU . the others are inferenced by CPU. but the inference time is slow than all three models inferenced by CPU . i want to know the reason and how to inference three models by parallel

best Lisa Shi

Iffa-Intel commented 2 years ago

Before proceeding further, may I know what you are trying to achieve with these models and how these three models relate to each other?

Plus, do the inference results depend on each other?

For example, this Pedestrian Tracker C++ Demo use two models that run at parallel. One model is the person-detection-retail-0013 and the other one is the person-reidentification-retail-0277

Both models have their own functionality and when used together with the Demo code, they showcase a Pedestrian Tracking scenario: it reads frames from an input video sequence, detects pedestrians in the frames, and builds trajectories of movement of the pedestrians in a frame-by-frame manner.

There are parameters to run the application with the OpenVINO pre-trained models with inferencing pedestrian detector on a GPU and pedestrian reidentification on a CPU. You may refer to the Pedestrian Tracker C++ Demo guide I had provided above.

If these are relevant to your use case, you may refer to that Demo source code and modify according to your needs.

shixinlishixinli commented 2 years ago

hi
the inference result does not depend on each other. the three models are formula, handwritten and PPocr. the input is a picture include formula, handwritten and print words.

first , positions of the formula , handwritten and print words can be inferred by PPocr detection model. second, formula small picture is inferenced by formula model on CPU, while handwritten small picture is inferenced by handwritten model on GPU. the print words small picture is inferenced by ppocr recognition model at the same time on CPU. i infer these three models by multithreading. i am sure , all these three models start being inferred at the same time . finally , all the results are collected as the output results

actually, handwritten inferenced by GPU while formula and print words inferenced by CPU , in this case, the time is slower. it is 1.4s handwritten and formula and print words inferenced by CPU , in this case, the time is faster. it is 1.0s

This is completely different from what I expected. inference with CPU and GPU should be faster. i want to know the reason. and how to infer the three models on CPU and GPU by parallel?

best Lisa Shi

Iffa-Intel commented 2 years ago

The OpenVINO Multi-Device Plugin might be the one you are looking for. Generally, The Multi-Device plugin automatically assigns inference requests to available computational devices to execute the requests in parallel.

I think what you are trying to do is to share inference requests between 2 devices which can be done in the MULTI. The easiest way is to specify a number of requests for each device using parentheses: “MULTI:CPU(2),GPU(1)” . However, such an explicit configuration is not performance-portable and not recommended. The better way is to configure the individual devices and query the resulting number of requests to be used at the application level ( Configure the Individual Devices and Creating the Multi-Device On Top).

The performance of accelerators works well with Multi-Device, while the CPU+GPU execution poses some performance caveats, as these devices share power, bandwidth and other resources. For example, it is recommended to enable the GPU throttling hint (which saves another CPU thread for CPU inferencing).

Please help to carefully read and understand the OpenVINO Multi-Device Plugin documentation for examples and detailed information .

Iffa-Intel commented 2 years ago

Closing issue, feel free to re-open or start a new issue if additional assistance is needed.