Closed Liujingxiu23 closed 4 years ago
There is another question. For my own model, I convert model torch->onnx->openvino model, and I get .bin and .xml. How can I do model inference using C++ api using openvino only? Is there any examples for user-defined model with any possible model structure.
For example, my model has conv and lstm layers, with input[batch_size, input_len, feat_dim] and output[batch_size, 1, output_num]. My input is not Layout::NCHW or any other type in Layout.
CC @tomdol @ilyachur
For example, my model has conv and lstm layers, with input[batch_size, input_len, feat_dim] and output[batch_size, 1, output_num]. My input is not Layout::NCHW or any other type in Layout.
Is it applicable to use one of the predefined samples from https://github.com/openvinotoolkit/openvino/tree/master/inference-engine/samples? E.g. we have generic app like benchmark_app
to measure performance of any network as well as dedicated app for classification and detection networks.
@Liujingxiu23 do you have any more information about the segmentation fault? A core dump/stack trace perhaps? You can re-run under gdb to obtain it. There are plenty of possible reasons of this segfault but without a stack trace it's going to be virtually impossible to help.
The differences between the code in github and the tgz packages:
Let me know if this helps.
I used the master branch of openvino and onnxruntime from git. Which branchs are more suitable?
For the gdb backtrace: Starting program: /data/liujingxiu/Torch-Vs-ONNX/Personal-Model/test_onnxruntime-openvino/example-app [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib64/libthread_db.so.1". 2020-06-11 09:50:53.228775700 [W:onnxruntime:test, abi_session_options.cc:147 SetIntraOpNumThreads] Since openmp is enabled in this build, this API cannot be used to configure intra op num threads. Please use the openmp environment variables to control the number of threads. 2020-06-11 09:50:53.254344213 [W:onnxruntime:, graph.cc:2619 CleanUnusedInitializers] Removing initializer 'similarity_bias'. It is not used by any node and should be removed from the model. 2020-06-11 09:50:53.254374796 [W:onnxruntime:, graph.cc:2619 CleanUnusedInitializers] Removing initializer 'similarity_weight'. It is not used by any node and should be removed from the model. num_input_nodes: 1 Input 0 : name=inputs Input 0 : type=1 Input 0 : num_dims=3 Input 0 : dim 0=-1 Input 0 : dim 1=-1 Input 0 : dim 2=40 ======================= Before Run ====================== [WARN] 2020-06-11T01:50:53z src/ngraph/frontend/onnx_import/ops_bridge.cpp 190 Domain 'com.microsoft.nchwc' not recognized by nGraph [WARN] 2020-06-11T01:50:53z src/ngraph/frontend/onnx_import/ops_bridge.cpp 190 Domain 'com.microsoft' not recognized by nGraph [WARN] 2020-06-11T01:50:53z src/ngraph/frontend/onnx_import/ops_bridge.cpp 190 Domain 'com.microsoft.mlfeaturizers' not recognized by nGraph [WARN] 2020-06-11T01:50:53z src/ngraph/frontend/onnx_import/ops_bridge.cpp 190 Domain 'ai.onnx.preview.training' not recognized by nGraph [WARN] 2020-06-11T01:50:53z src/ngraph/frontend/onnx_import/ops_bridge.cpp 190 Domain 'ai.onnx.training' not recognized by nGraph [WARN] 2020-06-11T01:50:53z src/ngraph/frontend/onnx_import/ops_bridge.cpp 190 Domain 'ai.onnx.ml' not recognized by nGraph Program received signal SIGSEGV, Segmentation fault. 0x00007ffff757437c in ngraph::op::v0::Parameter::Parameter(ngraph::element::Type const&, ngraph::PartialShape const&, bool) () from /data/liujingxiu/Torch-Vs-ONNX/Personal-Model/test_onnxruntime-openvino/libonnxruntime/onnxruntime/lib/libngraph.so Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 (gdb) bt
from /data/liujingxiu/Torch-Vs-ONNX/Personal-Model/test_onnxruntime-openvino/libonnxruntime/onnxruntime/lib/libngraph.so
__p=<optimized out>, __a=...) at /usr/include/c++/5/bits/alloc_traits.h:530
this=0x85c5c0) at /usr/include/c++/5/bits/shared_ptr_base.h:522
at /usr/include/c++/5/bits/shared_ptr_base.h:617
at /usr/include/c++/5/bits/shared_ptr.h:636
at /home/personal/work/liujingxiu/onnxruntime/build/Linux/RelWithDebInfo/ngraph/src/project_ngraph/src/ngraph/frontend/onnx_import/core/value_info.hpp:102
at /home/personal/work/liujingxiu/onnxruntime/build/Linux/RelWithDebInfo/ngraph/src/project_ngraph/src/ngraph/frontend/onnx_import/core/value_info.hpp:95
at /home/personal/work/liujingxiu/onnxruntime/build/Linux/RelWithDebInfo/ngraph/src/project_ngraph/src/ngraph/frontend/onnx_import/core/graph.cpp:127
at /home/personal/work/liujingxiu/onnxruntime/build/Linux/RelWithDebInfo/ngraph/src/project_ngraph/src/ngraph/frontend/onnx_import/onnx.cpp:73
device_id=..., precision=...)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/providers/openvino/backend_utils.cc:51
model_proto=..., global_context=..., subgraph_context=...)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/providers/openvino/backends/basic_backend.cc:29
---Type
at /usr/include/c++/5/bits/alloc_traits.h:530
at /usr/include/c++/5/bits/shared_ptr_base.h:522
this=<optimized out>) at /usr/include/c++/5/bits/shared_ptr_base.h:1097
this=<optimized out>) at /usr/include/c++/5/bits/shared_ptr.h:319
at /usr/include/c++/5/bits/shared_ptr.h:636
subgraph_context=...)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/providers/openvino/backends/backend_factory.cc:21
context=0x7fffffffbb50)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/providers/openvino/backend_manager.cc:264
api=<optimized out>, state=<optimized out>)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/providers/openvino/openvino_execution_provider.cc:967
__args#1=<optimized out>, __args#2=<optimized out>) at /usr/include/c++/5/functional:1857
---Type
__args#0=0x84a510, this=0x87f530) at /usr/include/c++/5/functional:2267
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/framework/func_kernel.h:41
feed_mlvalue_idxs=..., feeds=..., fetch_mlvalue_idxs=..., fetches=..., fetch_allocators=..., logger=...)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/framework/sequential_executor.cc:271
feeds_fetches_manager=..., feeds=..., fetches=..., fetch_allocators=..., execution_mode=ORT_SEQUENTIAL,
terminate_flag=@0x7fffffffdd48: false, logger=..., only_execute_path_to_fetches=false)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/framework/utils.cc:479
feeds=..., fetches=..., execution_mode=ORT_SEQUENTIAL, terminate_flag=@0x7fffffffdd48: false,
logger=..., only_execute_path_to_fetches=false)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/framework/utils.cc:539
feed_names=..., feeds=..., output_names=..., p_fetches=0x7fffffffdcc0)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/session/inference_session.cc:1163
input=<optimized out>, input_len=<optimized out>, output_names1=<optimized out>, output_names_len=1,
output=0x7fffffffde10)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/session/onnxruntime_c_api.cc:503
I build onnruntime again, and found the tests are not all passed, 4/5 passed, 1 failed. And the related log are:
1 1818 1: [----------] 1 test from ParallelExecutor 2 1819 1: [ RUN ] ParallelExecutor.TestStatusPropagation 3 1820 1: 2020-06-11 07:44:13.113890902 [E:onnxruntime:TestOp:TestOp, parallel_executor.cc:204 RunNodeAsync] Non-zero status code returned while running TestOp node. Name:'node1' Status Message: Ac tion was 1 4 1821 1: 2020-06-11 07:44:13.113926203 [E:onnxruntime:TestOp:TestOp, parallel_executor.cc:75 Execute] [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running TestOp node. Name: 'node1' Status Message: Action was 1 5 1822 1: 2020-06-11 07:44:13.129843868 [E:onnxruntime:TestOp:TestOp, parallel_executor.cc:204 RunNodeAsync] Non-zero status code returned while running TestOp node. Name:'node1' Status Message: Ac tion was 1 6 1823 1: 2020-06-11 07:44:13.129880202 [E:onnxruntime:TestOp:TestOp, parallel_executor.cc:75 Execute] [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running TestOp node. Name: 'node1' Status Message: Action was 1 7 1824 1: 2020-06-11 07:44:13.133232185 [E:onnxruntime:TestOp:TestOp, parallel_executor.cc:204 RunNodeAsync] Non-zero status code returned while running TestOp node. Name:'node1' Status Message: /h ome/personal/work/liujingxiu/onnxruntime/onnxruntime/test/framework/parallel_executor_test.cc:59 virtual onnxruntime::common::Status onnxruntime::test::TestOp::OpKernelImpl::Compute(onnxruntime::OpKe rnelContext) const Throwing as action was 2 8 1825 1:- 9 1826 1: 2020-06-11 07:44:13.133259500 [E:onnxruntime:TestOp:TestOp, parallel_executor.cc:75 Execute] [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running TestO p node. Name:'node1' Status Message: /home/personal/work/liujingxiu/onnxruntime/onnxruntime/test/framework/parallel_executor_test.cc:59 virtual onnxruntime::common::Status onnxruntime::test::TestOp:: OpKernelImpl::Compute(onnxruntime::OpKernelContext) const Throwing as action was 2 10 1827 1:- 11 1828 1: 2020-06-11 07:44:13.148291937 [E:onnxruntime:TestOp:TestOp, parallel_executor.cc:204 RunNodeAsync] Non-zero status code returned while running TestOp node. Name:'node1' Status Message: /h ome/personal/work/liujingxiu/onnxruntime/onnxruntime/test/framework/parallel_executor_test.cc:59 virtual onnxruntime::common::Status onnxruntime::test::TestOp::OpKernelImpl::Compute(onnxruntime::OpKe rnelContext) const Throwing as action was 2 12 1829 1:- 13 1830 1: 2020-06-11 07:44:13.148322374 [E:onnxruntime:TestOp:TestOp, parallel_executor.cc:75 Execute] [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running TestO p node. Name:'node1' Status Message: /home/personal/work/liujingxiu/onnxruntime/onnxruntime/test/framework/parallel_executor_test.cc:59 virtual onnxruntime::common::Status onnxruntime::test::TestOp:: OpKernelImpl::Compute(onnxruntime::OpKernelContext) const Throwing as action was 2
Can you try if your model works through OpenVINO API?
I want to use ort since the api is really clear and simple , like torch and tensorflow. The example benchmark_app , it seems that the usage is quiet complicated for me.
Perhaps issue should be addressed to onnxruntime then?
@Liujingxiu23 thanks for the backtrace. The segfault happens when the ONNX model is imported from ONNXRuntime to OV provider. The backtrace indicates that it crashes when one of the inputs gets translated to ngraph::Parameter, which segfaults in the constructor. Did you see the exact line in the constructor's body where the access violation happened? Also there's a suggestion in the logs you shared to install: debuginfo-install glibc-2.17-222.el7.x86_64 When you install it you might get even more detailed stacktrace.
The whole segfault looks very model-specific to me. Can you also share which model you've used for this test and if it's publically available?
@tomdol I'll try to install glibc-2.17-222.el7.x86_64. I'm sorry I can not share the model. But the model structure is very simply, 3 layers uni-direction lstm,with input size [-1, 160, 80]. where -1 means dynamic batch-size. My subjuect is not image, so the input is not NCHW or other similar types.
where -1 means dynamic batch-size
Just a suggestion - can you edit the model and instead of -1 use a dimension variable? Just replace dim_value: -1
with something like this: https://github.com/openvinotoolkit/openvino/blob/master/ngraph/test/models/onnx/dynamic_shapes/ab_plus_c.prototxt#L25-L27
This is how we(in the onnx importer) expect dynamic dimensions to be defined in ONNX models and it adheres to this specification: https://github.com/onnx/onnx/blob/master/docs/IR.md#static-tensor-shapes
@tomdol Thank you for you relpy! I will try to reset the batch-size, and I will try a more common model , for example ResNet 18 , to verify if the current problem only occur on my own model.
I rebuild onnxruntime-ret-1.3.0 using l_openvino_toolkit_p_2020.2.120.tgz and , model predict can be done successfully. Maybe core dump happened because I used wrong way of building openvino or wrong release version of onnxruntime. Thank you very much for your help! @tomdol
No problem, I'm glad it worked for you :) BTW. Can we close this ticket?
I want to use onnxruntime with openvino. I downloaded "l_openvino_toolkitp***.tgz" from https://docs.openvinotoolkit.org/ .But the install must with sudo authority,but I do not have sudo authority.
Then I downloaed from github, and compile openvino from sourcecode , build done successfully. Then I build onnxruntime with openvino from sourcecode, build done successfully too.
Then I test C++ model inference. When I did not use openvino as EP, everything was right. When using openvino as EP, model load seemed ok, the input dims could be printed successfully, but core dump happened during Run.
[WARN] 2020-06-08T03:09:14z src/ngraph/frontend/onnx_import/ops_bridge.cpp 190 Domain 'com.microsoft.nchwc' not recognized by nGraph ....... [WARN] 2020-06-08T03:09:14z src/ngraph/frontend/onnx_import/ops_bridge.cpp 190 Domain 'ai.onnx.ml' not recognized by nGraph Segmentation fault (core dumped)
My questions are: