The effect of openvino on dynamic shape tasks seems to be slower than onnxruntime

sanbuphy commented 1 year ago

I have tried a lot of models. There is no doubt that openvino is faster when the input shape is fixed, but for dynamic input models such as BERT, openvino will not be able to catch up the speed of onnxruntime.

I want to know the reason for this, Thank you very much.

rkazants commented 1 year ago

Hi @sanbuphy,

Generally there are few factors that can affect performance in case of dynamism: 1) not all optimization transformations become applicable in case of dynamic shapes, so probably we miss something; 2) memory size to allocate is only known during inference, so we can't pre-allocate it during model loading like for static shapes; 3) plugin kernels quite different for dynamic shapes.

It would be great if you share more details about your case: model link, hardware details, performance metrics. So we will take a look at your case.

Best regards, Roman

sanbuphy commented 1 year ago

few factors that can affect performance in case of dynamism: 1) not all optimization transformations become applicable in case of dynamic shapes, so probably we miss something; 2) memory size to allocate is only known during inference, so we can't pre-allocate it during model loading like for static shapes; 3) plugin kernels quite different for dynamic shapes.

It would be great if you share more details about your case: model link, hardware details, performance metrics. So we will take a look at your case.

Best regards,

Hi! thank you very much. I love openvino and i want to try to let it be better. Let me prepare a complete test case for you

sanbuphy commented 1 year ago

Hi @sanbuphy,

Generally there are few factors that can affect performance in case of dynamism: 1) not all optimization transformations become applicable in case of dynamic shapes, so probably we miss something; 2) memory size to allocate is only known during inference, so we can't pre-allocate it during model loading like for static shapes; 3) plugin kernels quite different for dynamic shapes.

It would be great if you share more details about your case: model link, hardware details, performance metrics. So we will take a look at your case.

Best regards, Roman

hello @rkazants , I switched to email to your email because the model exceeds 25mb,

rkazants commented 1 year ago

Hi @sanbuphy,

Forwarded the model to our plugin team. @dmitry-gorokhov, please take a look.

Best regards, Roman

maxnick commented 1 year ago

Hi @sanbuphy,

Thank you for the benchmarking scripts you prepared! Could you please elaborate a little bit on your workload? The thing is, in the case of dynamic shapes models, benchmarking gets a little bit tricky and it is always better to adjust it to the use case scenario to get more accurate comparison.

Regarding your benchmark. I played around with it, and here are the conclusions I can draw:

We spotted some performance issues in our Python API and we are working on them.
The current OpenVINO master state demonstrate better average infer request time on my Intel(R) Core(TM) i9-10980XE, but to do the average infer request time measurements you script needs to be modified. To have more control over the results accuracy, it is better to use a simple loop with time python lib calls for time measurements instead of Jupyter %%timeit. And if we rewrite Jupyter %%timeit with a loop and measure not only the average time but also single infer request time, we will see that in the case of OV usage, the time for a single infer request will decline over the first several iterations until it reaches a constant value. This is a specific of TBB, which we use by default, while ORT use OpenMP and we do not see such effect in that case. So, if the number of iterations is relatively low (e.g. 10), the results do not reflect some really "average" performance, since they contain only first relatively slow infer requests before the infer request time stabilizes. Thus, to measure the average time, it is better to do more iterations, for example 100. But again, if in your workload you simply load model, do one infer request and unload the model, then the benchmark built properly and we have to put some effort to optimize the time of first few iterations (see item 1).
If in your workload the input shapes change from infer request to infer request, then this benchmark is not very accurate since with only one input shape it is not possible to reproduce all the dynamic shapes processing overheads in both OpenVINO and ONNX RT.
To avoid internal copy of the input data, it is possible to use numpy array memory directly with shared_memory=True parameter in the request.infer(input_data, shared_memory=True) call. Please see https://github.com/openvinotoolkit/openvino/blob/master/src/bindings/python/src/openvino/runtime/ie_api.py#L51 for details. It does not make a significant impact in this benchmark since the input data sizes are relatively small, but it is good to know that it is possible to optimize the input data usage.

sanbuphy commented 1 year ago

Hi @sanbuphy,

Thank you for the benchmarking scripts you prepared! Could you please elaborate a little bit on your workload? The thing is, in the case of dynamic shapes models, benchmarking gets a little bit tricky and it is always better to adjust it to the use case scenario to get more accurate comparison.

Regarding your benchmark. I played around with it, and here are the conclusions I can draw:

We spotted some performance issues in our Python API and we are working on them.

The current OpenVINO master state demonstrate better average infer request time on my Intel(R) Core(TM) i9-10980XE, but to do the average infer request time measurements you script needs to be modified. To have more control over the results accuracy, it is better to use a simple loop with time python lib calls for time measurements instead of Jupyter %%timeit. And if we rewrite Jupyter %%timeit with a loop and measure not only the average time but also single infer request time, we will see that in the case of OV usage, the time for a single infer request will decline over the first several iterations until it reaches a constant value. This is a specific of TBB, which we use by default, while ORT use OpenMP and we do not see such effect in that case. So, if the number of iterations is relatively low (e.g. 10), the results do not reflect some really "average" performance, since they contain only first relatively slow infer requests before the infer request time stabilizes. Thus, to measure the average time, it is better to do more iterations, for example 100. But again, if in your workload you simply load model, do one infer request and unload the model, then the benchmark built properly and we have to put some effort to optimize the time of first few iterations (see item 1).

If in your workload the input shapes change from infer request to infer request, then this benchmark is not very accurate since with only one input shape it is not possible to reproduce all the dynamic shapes processing overheads in both OpenVINO and ONNX RT.

To avoid internal copy of the input data, it is possible to use numpy array memory directly with shared_memory=True parameter in the request.infer(input_data, shared_memory=True) call. Please see https://github.com/openvinotoolkit/openvino/blob/master/src/bindings/python/src/openvino/runtime/ie_api.py#L51 for details. It does not make a significant impact in this benchmark since the input data sizes are relatively small, but it is good to know that it is possible to optimize the input data usage.

Hi！ @maxnick

Thank you very much for your detailed help. Yes, I probably mainly want to see faster results at the beginning. I wish that OpenVINO will become better and better. I will try to change the "%%timeit" way and use the different way to test the inference time.

If you need, I can sort out the data of different dynamic shapes and make a new demo, but I came to this conclusion based on the overall findings (when there is no fixed shape, vino can't always exert its greatest power, and it will make the speed close to ort )My conclusion now is that openvino is almost always fastest on static shapes, but can be slower when there are dynamic inputs, and maybe cause some extreme memory allocation overhead for multi-shape tasks.

and then , I may need to study the difference between openmp and TBB you mentioned, because this also involves a very difficult problem that I studied before —— the speed of vino under multi-threading is not faster than ort（https://github.com/openvinotoolkit/openvino/issues/15730#event-8566494619，https://github.com/openvinotoolkit/openvino/issues/15908#event-8642790927，https://github.com/openvinotoolkit/openvino/issues/15573#event-8512643071）, I need to see ort Only by collecting more information from the source code can the problem be solved, or better targeted questions raised.

Thank you everyone.

akladiev commented 1 year ago

This issue will be closed in 2 weeks in case of no activity.

github-actions[bot] commented 2 months ago

This issue will be closed in a week because of 9 months of no activity.

github-actions[bot] commented 1 month ago

This issue was closed because it has been stalled for 9 months with no activity.

openvinotoolkit / openvino

The effect of openvino on dynamic shape tasks seems to be slower than onnxruntime #15831