[Bug]Speed up on Multiple NCS2

openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

https://docs.openvino.ai

Apache License 2.0

6.87k stars 2.19k forks source link

[Bug]Speed up on Multiple NCS2 #11767

Closed baicaiPCX closed 2 years ago

baicaiPCX commented 2 years ago

Hello, I want to speed up inference by using multiple NCS2. But inference on each device of mutiple NCS2 took the same time as inference on only one NCS2. For example, the inference duration is 1.7 secends with 100 input datas on two NCS2 devices, whereas the inference duration is also 1.7 secends with 50 input datas on only one NCS2 device. I tested the performance using the asynchronous API. Why does the performance not seem to improve on each device of two NCS2 devices?

Pter, Sead

brmarkus commented 2 years ago

Can you describe your implementation in more details, please?

How do you parallelize the inference requests in details? Do you just send independent inference requests to multiple NCS2 devices? In this case the total throughput will be higher as each device is running its own inference-request in parallel. But the inference-processing itself will not become faster...

Or do you make use of MULTI or HETERO plugin to combine multiple devices?

baicaiPCX commented 2 years ago

Yes, I also think that each device of multiple NCS2 devices just receives independent inference requests. Could you tell me how to avoid this case? My python implementation as fellows:

from openvino.runtime import Core,AsyncInferQueue
import openvino.runtime as ov

# init model
ie=Core()
model=ie.read_model(model="resnet50.xml")
compiled_model=ie.compile_model(model=model,device_name="MULTI:MYRIAD.1.7-ma2480,MYRIAD.1.1-ma2480")

# create inference requests
input_tensors=[ov.Tensor(input,ov.Shape(input.shape)) for input in inputs] # inputs is a list of input.
infer_queue=AsyncInferQueue(compiled_model,len(input_tensors))

# infer
for i,input_tensor in enumerate(input_tensors):
    infer_queue.start_async({0:input_tensor},i)
infer_queue.wait_all()

Look forward to your feedback, thanks.

jgespino commented 2 years ago

@baicaiPCX Have you tried measure performance using the benchmark_app? Could you please share your model and input images to test from my side? If you can provide the model in the native framework format (onnx, caffe, tf, etc) and model optimizer command used.

Also, what version of OpenVINO are you using?

baicaiPCX commented 1 year ago

您好，您的邮件小皮已经收到啦！！！