openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
6.57k stars 2.12k forks source link

openvino used by onnxruntime, build successfully, run core dump #851

Closed Liujingxiu23 closed 4 years ago

Liujingxiu23 commented 4 years ago

I want to use onnxruntime with openvino. I downloaded "l_openvino_toolkitp***.tgz" from https://docs.openvinotoolkit.org/ .But the install must with sudo authority,but I do not have sudo authority.

Then I downloaed from github, and compile openvino from sourcecode , build done successfully. Then I build onnxruntime with openvino from sourcecode, build done successfully too.

Then I test C++ model inference. When I did not use openvino as EP, everything was right. When using openvino as EP, model load seemed ok, the input dims could be printed successfully, but core dump happened during Run.

[WARN] 2020-06-08T03:09:14z src/ngraph/frontend/onnx_import/ops_bridge.cpp 190 Domain 'com.microsoft.nchwc' not recognized by nGraph ....... [WARN] 2020-06-08T03:09:14z src/ngraph/frontend/onnx_import/ops_bridge.cpp 190 Domain 'ai.onnx.ml' not recognized by nGraph Segmentation fault (core dumped)

My questions are:

  1. "l_openvino_toolkitp***.tgz" vs git source-code, what is the difference? Can I use source code downloaded from git to build and install openvino when use it in onnxruntime
  2. For "Core dump", What are the possible reasons?
Liujingxiu23 commented 4 years ago

There is another question. For my own model, I convert model torch->onnx->openvino model, and I get .bin and .xml. How can I do model inference using C++ api using openvino only? Is there any examples for user-defined model with any possible model structure.

For example, my model has conv and lstm layers, with input[batch_size, input_len, feat_dim] and output[batch_size, 1, output_num]. My input is not Layout::NCHW or any other type in Layout.

ilya-lavrenov commented 4 years ago

CC @tomdol @ilyachur

For example, my model has conv and lstm layers, with input[batch_size, input_len, feat_dim] and output[batch_size, 1, output_num]. My input is not Layout::NCHW or any other type in Layout.

Is it applicable to use one of the predefined samples from https://github.com/openvinotoolkit/openvino/tree/master/inference-engine/samples? E.g. we have generic app like benchmark_app to measure performance of any network as well as dedicated app for classification and detection networks.

tomdol commented 4 years ago

@Liujingxiu23 do you have any more information about the segmentation fault? A core dump/stack trace perhaps? You can re-run under gdb to obtain it. There are plenty of possible reasons of this segfault but without a stack trace it's going to be virtually impossible to help.

The differences between the code in github and the tgz packages:

Let me know if this helps.

Liujingxiu23 commented 4 years ago

I used the master branch of openvino and onnxruntime from git. Which branchs are more suitable?

For the gdb backtrace: Starting program: /data/liujingxiu/Torch-Vs-ONNX/Personal-Model/test_onnxruntime-openvino/example-app [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib64/libthread_db.so.1". 2020-06-11 09:50:53.228775700 [W:onnxruntime:test, abi_session_options.cc:147 SetIntraOpNumThreads] Since openmp is enabled in this build, this API cannot be used to configure intra op num threads. Please use the openmp environment variables to control the number of threads. 2020-06-11 09:50:53.254344213 [W:onnxruntime:, graph.cc:2619 CleanUnusedInitializers] Removing initializer 'similarity_bias'. It is not used by any node and should be removed from the model. 2020-06-11 09:50:53.254374796 [W:onnxruntime:, graph.cc:2619 CleanUnusedInitializers] Removing initializer 'similarity_weight'. It is not used by any node and should be removed from the model. num_input_nodes: 1 Input 0 : name=inputs Input 0 : type=1 Input 0 : num_dims=3 Input 0 : dim 0=-1 Input 0 : dim 1=-1 Input 0 : dim 2=40 ======================= Before Run ====================== [WARN] 2020-06-11T01:50:53z src/ngraph/frontend/onnx_import/ops_bridge.cpp 190 Domain 'com.microsoft.nchwc' not recognized by nGraph [WARN] 2020-06-11T01:50:53z src/ngraph/frontend/onnx_import/ops_bridge.cpp 190 Domain 'com.microsoft' not recognized by nGraph [WARN] 2020-06-11T01:50:53z src/ngraph/frontend/onnx_import/ops_bridge.cpp 190 Domain 'com.microsoft.mlfeaturizers' not recognized by nGraph [WARN] 2020-06-11T01:50:53z src/ngraph/frontend/onnx_import/ops_bridge.cpp 190 Domain 'ai.onnx.preview.training' not recognized by nGraph [WARN] 2020-06-11T01:50:53z src/ngraph/frontend/onnx_import/ops_bridge.cpp 190 Domain 'ai.onnx.training' not recognized by nGraph [WARN] 2020-06-11T01:50:53z src/ngraph/frontend/onnx_import/ops_bridge.cpp 190 Domain 'ai.onnx.ml' not recognized by nGraph Program received signal SIGSEGV, Segmentation fault. 0x00007ffff757437c in ngraph::op::v0::Parameter::Parameter(ngraph::element::Type const&, ngraph::PartialShape const&, bool) () from /data/liujingxiu/Torch-Vs-ONNX/Personal-Model/test_onnxruntime-openvino/libonnxruntime/onnxruntime/lib/libngraph.so Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 (gdb) bt

0 0x00007ffff757437c in ngraph::op::v0::Parameter::Parameter(ngraph::element::Type const&, ngraph::PartialShape const&, bool) ()

from /data/liujingxiu/Torch-Vs-ONNX/Personal-Model/test_onnxruntime-openvino/libonnxruntime/onnxruntime/lib/libngraph.so

1 0x00007ffff5201b4d in construct<ngraph::op::v0::Parameter, ngraph::element::Type const&, ngraph::PartialShape const&> (__p=, this=) at /usr/include/c++/5/ext/new_allocator.h:120

2 construct<ngraph::op::v0::Parameter, ngraph::element::Type const&, ngraph::PartialShape const&> (

__p=<optimized out>, __a=...) at /usr/include/c++/5/bits/alloc_traits.h:530

3 _Sp_counted_ptr_inplace<ngraph::element::Type const&, ngraph::PartialShape const&> (__a=...,

this=0x85c5c0) at /usr/include/c++/5/bits/shared_ptr_base.h:522

4 __shared_count<ngraph::op::v0::Parameter, std::allocator, ngraph::element::Type const&, ngraph::PartialShape const&> (__a=..., this=0x7fffffff8ac8)

at /usr/include/c++/5/bits/shared_ptr_base.h:617

5 __shared_ptr<std::allocator, ngraph::element::Type const&, ngraph::PartialShape const&> (a=..., tag=..., this=0x7fffffff8ac0) at /usr/include/c++/5/bits/shared_ptr_base.h:1097

6 shared_ptr<std::allocator, ngraph::element::Type const&, ngraph::PartialShape const&> (a=..., tag=..., this=0x7fffffff8ac0) at /usr/include/c++/5/bits/shared_ptr.h:319

7 allocate_shared<ngraph::op::v0::Parameter, std::allocator, ngraph::element::Type const&, ngraph::PartialShape const&> (__a=...) at /usr/include/c++/5/bits/shared_ptr.h:620

8 make_shared<ngraph::op::v0::Parameter, ngraph::element::Type const&, ngraph::PartialShape const&> ()

at /usr/include/c++/5/bits/shared_ptr.h:636

9 get_ng_parameter (this=0x7fffffff8b20)

at /home/personal/work/liujingxiu/onnxruntime/build/Linux/RelWithDebInfo/ngraph/src/project_ngraph/src/ngraph/frontend/onnx_import/core/value_info.hpp:102

10 get_ng_node (initializers=..., parameters=..., this=0x7fffffff8b20)

at /home/personal/work/liujingxiu/onnxruntime/build/Linux/RelWithDebInfo/ngraph/src/project_ngraph/src/ngraph/frontend/onnx_import/core/value_info.hpp:95

11 ngraph::onnx_import::Graph::Graph (this=0x7fffffff8e70, graph_proto=..., model=...)

at /home/personal/work/liujingxiu/onnxruntime/build/Linux/RelWithDebInfo/ngraph/src/project_ngraph/src/ngraph/frontend/onnx_import/core/graph.cpp:127

12 0x00007ffff51fe8b0 in ngraph::onnx_import::import_onnx_model (stream=...)

at /home/personal/work/liujingxiu/onnxruntime/build/Linux/RelWithDebInfo/ngraph/src/project_ngraph/src/ngraph/frontend/onnx_import/onnx.cpp:73

13 0x00007ffff666e132 in onnxruntime::openvino_ep::backend_utils::CreateCNNNetwork (model_proto=...,

device_id=..., precision=...)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/providers/openvino/backend_utils.cc:51

14 0x00007ffff6661a91 in onnxruntime::openvino_ep::BasicBackend::BasicBackend (this=0x850b70,

model_proto=..., global_context=..., subgraph_context=...)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/providers/openvino/backends/basic_backend.cc:29

15 0x00007ffff665a9c7 in construct<onnxruntime::openvino_ep::BasicBackend, onnx::ModelProto const&, onnxruntime::openvino_ep::GlobalContext&, onnxruntime::openvino_ep::SubGraphContext const&> (__p=,

---Type to continue, or q to quit--- this=) at /usr/include/c++/5/ext/new_allocator.h:120

16 construct<onnxruntime::openvino_ep::BasicBackend, onnx::ModelProto const&, onnxruntime::openvino_ep::GlobalContext&, onnxruntime::openvino_ep::SubGraphContext const&> (p=, a=...)

at /usr/include/c++/5/bits/alloc_traits.h:530

17 _Sp_counted_ptr_inplace<onnx::ModelProto const&, onnxruntime::openvino_ep::GlobalContext&, onnxruntime::openvino_ep::SubGraphContext const&> (__a=..., this=0x850b60)

at /usr/include/c++/5/bits/shared_ptr_base.h:522

18 __shared_count<onnxruntime::openvino_ep::BasicBackend, std::allocator, onnx::ModelProto const&, onnxruntime::openvino_ep::GlobalContext&, onnxruntime::openvino_ep::SubGraphContext const&> (__a=..., this=) at /usr/include/c++/5/bits/shared_ptr_base.h:617

19 shared_ptr<std::allocator, onnx::ModelProto const&, onnxruntime::openvino_ep::GlobalContext&, onnxruntime::openvino_ep::SubGraphContext const&> (a=..., __tag=...,

this=<optimized out>) at /usr/include/c++/5/bits/shared_ptr_base.h:1097

20 shared_ptr<std::allocator, onnx::ModelProto const&, onnxruntime::openvino_ep::GlobalContext&, onnxruntime::openvino_ep::SubGraphContext const&> (a=..., tag=...,

this=<optimized out>) at /usr/include/c++/5/bits/shared_ptr.h:319

21 allocate_shared<onnxruntime::openvino_ep::BasicBackend, std::allocator, onnx::ModelProto const&, onnxruntime::openvino_ep::GlobalContext&, onnxruntime::openvino_ep::SubGraphContext const&> (__a=...) at /usr/include/c++/5/bits/shared_ptr.h:620

22 make_shared<onnxruntime::openvino_ep::BasicBackend, onnx::ModelProto const&, onnxruntime::openvino_ep::GlobalContext&, onnxruntime::openvino_ep::SubGraphContext const&> ()

at /usr/include/c++/5/bits/shared_ptr.h:636

23 onnxruntime::openvino_ep::BackendFactory::MakeBackend (model_proto=..., global_context=...,

subgraph_context=...)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/providers/openvino/backends/backend_factory.cc:21

24 0x00007ffff665966f in onnxruntime::openvino_ep::BackendManager::Compute (this=0x87c820, api=...,

context=0x7fffffffbb50)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/providers/openvino/backend_manager.cc:264

25 0x00007ffff6649499 in operator() (__closure=, context=,

api=<optimized out>, state=<optimized out>)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/providers/openvino/openvino_execution_provider.cc:967

26 std::_Function_handler<onnxruntime::common::Status(void, const OrtApi, OrtKernelContext), onnxruntime::OpenVINOExecutionProvider::Compile(const std::vector<onnxruntime::Node>&, std::vector&)::<lambda(onnxruntime::FunctionState, const OrtApi, OrtKernelContext)> >::_M_invoke(const std::_Any_data &, <unknown type in /data/liujingxiu/Torch-Vs-ONNX/Personal-Model/test_onnxruntime-openvino/libonnxruntime/onnxruntime/lib/libonnxruntime.so.1.3.0, CU 0x91dc04, DIE 0xa01281>, <unknown type in /data/liujingxiu/Torch-Vs-ONNX/Personal-Model/test_onnxruntime-openvino/libonnxruntime/onnxruntime/lib/libonnxruntime.so.1.3.0, CU 0x91dc04, DIE 0xa01286>, <unknown type in /data/liujingxiu/Torch-Vs-ONNX/Personal-Model/test_onnxruntime-openvino/libonnxruntime/onnxruntime/lib/libonnxruntime.so.1.3.0, CU 0x91dc04, DIE 0xa0128b>) (functor=..., args#0=,

__args#1=<optimized out>, __args#2=<optimized out>) at /usr/include/c++/5/functional:1857

---Type to continue, or q to quit---

27 0x00007ffff6a1a1f0 in operator() (args#2=0x7fffffffbb50, args#1=0x7ffff70f9f00 ,

__args#0=0x84a510, this=0x87f530) at /usr/include/c++/5/functional:2267

28 onnxruntime::FunctionKernel::Compute (this=0x87f4e0, context=0x7fffffffbb50)

at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/framework/func_kernel.h:41

29 0x00007ffff6a651a7 in onnxruntime::SequentialExecutor::Execute(onnxruntime::SessionState const&, std::vector<int, std::allocator > const&, std::vector<OrtValue, std::allocator > const&, std::vector<int, std::allocator > const&, std::vector<OrtValue, std::allocator >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)>, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&) (this=0x84c2e0, session_state=...,

feed_mlvalue_idxs=..., feeds=..., fetch_mlvalue_idxs=..., fetches=..., fetch_allocators=..., logger=...)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/framework/sequential_executor.cc:271

30 0x00007ffff6a521f7 in onnxruntime::utils::ExecuteGraphImpl(const onnxruntime::SessionState &, const onnxruntime::FeedsFetchesManager &, const std::vector<OrtValue, std::allocator > &, std::vector<OrtValue, std::allocator > &, const std::unordered_map<long unsigned int, std::function<onnxruntime::common::Status(const onnxruntime::TensorShape&, const OrtMemoryInfo&, OrtValue&, bool&)>, std::hash, std::equal_to, std::allocator<std::pair<long unsigned int const, std::function<onnxruntime::common::Status(const onnxruntime::TensorShape&, const OrtMemoryInfo&, OrtValue&, bool&)> > > > &, ExecutionMode, const bool &, const onnxruntime::logging::Logger &, bool) (session_state=...,

feeds_fetches_manager=..., feeds=..., fetches=..., fetch_allocators=..., execution_mode=ORT_SEQUENTIAL,
terminate_flag=@0x7fffffffdd48: false, logger=..., only_execute_path_to_fetches=false)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/framework/utils.cc:479

31 0x00007ffff6a53a8a in onnxruntime::utils::ExecuteGraph (session_state=..., feeds_fetches_manager=...,

feeds=..., fetches=..., execution_mode=ORT_SEQUENTIAL, terminate_flag=@0x7fffffffdd48: false,
logger=..., only_execute_path_to_fetches=false)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/framework/utils.cc:539

32 0x00007ffff6633601 in onnxruntime::InferenceSession::Run (this=this@entry=0x82f7f0, run_options=...,

feed_names=..., feeds=..., output_names=..., p_fetches=0x7fffffffdcc0)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/session/inference_session.cc:1163

33 0x00007ffff66062f1 in OrtApis::Run (sess=0x82f7f0, run_options=0x0, input_names=,

input=<optimized out>, input_len=<optimized out>, output_names1=<optimized out>, output_names_len=1,
output=0x7fffffffde10)
at /home/personal/work/liujingxiu/onnxruntime/onnxruntime/core/session/onnxruntime_c_api.cc:503

34 0x0000000000404e0b in main ()

Liujingxiu23 commented 4 years ago

I build onnruntime again, and found the tests are not all passed, 4/5 passed, 1 failed. And the related log are:

1 1818 1: [----------] 1 test from ParallelExecutor 2 1819 1: [ RUN ] ParallelExecutor.TestStatusPropagation 3 1820 1: 2020-06-11 07:44:13.113890902 [E:onnxruntime:TestOp:TestOp, parallel_executor.cc:204 RunNodeAsync] Non-zero status code returned while running TestOp node. Name:'node1' Status Message: Ac tion was 1 4 1821 1: 2020-06-11 07:44:13.113926203 [E:onnxruntime:TestOp:TestOp, parallel_executor.cc:75 Execute] [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running TestOp node. Name: 'node1' Status Message: Action was 1 5 1822 1: 2020-06-11 07:44:13.129843868 [E:onnxruntime:TestOp:TestOp, parallel_executor.cc:204 RunNodeAsync] Non-zero status code returned while running TestOp node. Name:'node1' Status Message: Ac tion was 1 6 1823 1: 2020-06-11 07:44:13.129880202 [E:onnxruntime:TestOp:TestOp, parallel_executor.cc:75 Execute] [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running TestOp node. Name: 'node1' Status Message: Action was 1 7 1824 1: 2020-06-11 07:44:13.133232185 [E:onnxruntime:TestOp:TestOp, parallel_executor.cc:204 RunNodeAsync] Non-zero status code returned while running TestOp node. Name:'node1' Status Message: /h ome/personal/work/liujingxiu/onnxruntime/onnxruntime/test/framework/parallel_executor_test.cc:59 virtual onnxruntime::common::Status onnxruntime::test::TestOp::OpKernelImpl::Compute(onnxruntime::OpKe rnelContext) const Throwing as action was 2 8 1825 1:- 9 1826 1: 2020-06-11 07:44:13.133259500 [E:onnxruntime:TestOp:TestOp, parallel_executor.cc:75 Execute] [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running TestO p node. Name:'node1' Status Message: /home/personal/work/liujingxiu/onnxruntime/onnxruntime/test/framework/parallel_executor_test.cc:59 virtual onnxruntime::common::Status onnxruntime::test::TestOp:: OpKernelImpl::Compute(onnxruntime::OpKernelContext) const Throwing as action was 2 10 1827 1:- 11 1828 1: 2020-06-11 07:44:13.148291937 [E:onnxruntime:TestOp:TestOp, parallel_executor.cc:204 RunNodeAsync] Non-zero status code returned while running TestOp node. Name:'node1' Status Message: /h ome/personal/work/liujingxiu/onnxruntime/onnxruntime/test/framework/parallel_executor_test.cc:59 virtual onnxruntime::common::Status onnxruntime::test::TestOp::OpKernelImpl::Compute(onnxruntime::OpKe rnelContext) const Throwing as action was 2 12 1829 1:- 13 1830 1: 2020-06-11 07:44:13.148322374 [E:onnxruntime:TestOp:TestOp, parallel_executor.cc:75 Execute] [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running TestO p node. Name:'node1' Status Message: /home/personal/work/liujingxiu/onnxruntime/onnxruntime/test/framework/parallel_executor_test.cc:59 virtual onnxruntime::common::Status onnxruntime::test::TestOp:: OpKernelImpl::Compute(onnxruntime::OpKernelContext) const Throwing as action was 2

yurygorbachev commented 4 years ago

Can you try if your model works through OpenVINO API?

Liujingxiu23 commented 4 years ago

I want to use ort since the api is really clear and simple , like torch and tensorflow. The example benchmark_app , it seems that the usage is quiet complicated for me.

yurygorbachev commented 4 years ago

Perhaps issue should be addressed to onnxruntime then?

tomdol commented 4 years ago

@Liujingxiu23 thanks for the backtrace. The segfault happens when the ONNX model is imported from ONNXRuntime to OV provider. The backtrace indicates that it crashes when one of the inputs gets translated to ngraph::Parameter, which segfaults in the constructor. Did you see the exact line in the constructor's body where the access violation happened? Also there's a suggestion in the logs you shared to install: debuginfo-install glibc-2.17-222.el7.x86_64 When you install it you might get even more detailed stacktrace.

The whole segfault looks very model-specific to me. Can you also share which model you've used for this test and if it's publically available?

Liujingxiu23 commented 4 years ago

@tomdol I'll try to install glibc-2.17-222.el7.x86_64. I'm sorry I can not share the model. But the model structure is very simply, 3 layers uni-direction lstm,with input size [-1, 160, 80]. where -1 means dynamic batch-size. My subjuect is not image, so the input is not NCHW or other similar types.

tomdol commented 4 years ago

where -1 means dynamic batch-size

Just a suggestion - can you edit the model and instead of -1 use a dimension variable? Just replace dim_value: -1 with something like this: https://github.com/openvinotoolkit/openvino/blob/master/ngraph/test/models/onnx/dynamic_shapes/ab_plus_c.prototxt#L25-L27

This is how we(in the onnx importer) expect dynamic dimensions to be defined in ONNX models and it adheres to this specification: https://github.com/onnx/onnx/blob/master/docs/IR.md#static-tensor-shapes

Liujingxiu23 commented 4 years ago

@tomdol Thank you for you relpy! I will try to reset the batch-size, and I will try a more common model , for example ResNet 18 , to verify if the current problem only occur on my own model.

Liujingxiu23 commented 4 years ago

I rebuild onnxruntime-ret-1.3.0 using l_openvino_toolkit_p_2020.2.120.tgz and , model predict can be done successfully. Maybe core dump happened because I used wrong way of building openvino or wrong release version of onnxruntime. Thank you very much for your help! @tomdol

tomdol commented 4 years ago

No problem, I'm glad it worked for you :) BTW. Can we close this ticket?