microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.61k stars 2.92k forks source link

Crash in TensorrtExecutionProvider when TensorRT EP fails to create engine from network #21567

Open frenetj opened 3 months ago

frenetj commented 3 months ago

Describe the issue

When the TensorRT EP fails to create engine from network and the client calls run() again in the same session, the following crash occurs:

`#0 0x00007efc5442df84 in nvinfer1::ICudaEngine::getNbIOTensors() const (this=0x0) at tensort/include/NvInferRuntime.h:2160

1 0x00007efc54451cf8 in onnxruntime::TensorrtExecutionProvider::<lambda(onnxruntime::FunctionState, const OrtApi, OrtKernelContext)>::operator()(onnxruntime::FunctionState, const OrtApi , OrtKernelContext ) const (__closure=0x7efbfb1d8098, state=0x7efbfc81bf80, api=

0x7f02b6d0b2e0 <ort_api_1_to_18>, context=0x7fff94d9ce50) at onnxruntime-1.18.0/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:3395

2 0x00007efc54487e8c in std::_Function_handler<onnxruntime::common::Status(void, const OrtApi, OrtKernelContext), onnxruntime::TensorrtExecutionProvider::CreateNodeComputeInfoFromGraph(const onnxruntime::GraphViewer&, const onnxruntime::Node&, std::unordered_map<std::cxx11::basic_string, long unsigned int>&, std::unordered_map<std::cxx11::basic_string, long unsigned int>&, std::vector&)::<lambda(onnxruntime::FunctionState, const OrtApi, OrtKernelContext)> >::_M_invoke(const std::_Any_data &, void &&, const OrtApi &&, OrtKernelContext &&)

(__functor=..., __args#0=@0x7fff94d9cbb8: 0x7efbfc81bf80, __args#1=@0x7fff94d9cbb0: 0x7f02b6d0b2e0 <ort_api_1_to_18>, __args#2=@0x7fff94d9cba8: 0x7fff94d9ce50) at /usr/include/c++/8/bits/std_function.h:283

3 0x00007f02b59addac in std::function<onnxruntime::common::Status (void, OrtApi const, OrtKernelContext)>::operator()(void, OrtApi const, OrtKernelContext) const (this=0x7efbfb1d8098, args#0=0x7efbfc81bf80, args#1=0x7f02b6d0b2e0 , __args#2=0x7fff94d9ce50)

at /usr/include/c++/8/bits/std_function.h:687

4 0x00007f02b59a76b9 in onnxruntime::FunctionKernel::Compute(onnxruntime::OpKernelContext*) const (this=0x7efc014e2c00, context=0x7fff94d9ce50) at onnxruntime-1.18.0/onnxruntime/core/framework/func_kernel.h:52

5 0x00007f02b5ac7d5c in onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (ctx=..., idx=4937, stream_idx=0, terminate_flag=@0x2716f308: false, session_scope=...)

at onnxruntime-1.18.0/onnxruntime/core/framework/sequential_executor.cc:495

6 0x00007f02b5abef4c in onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (this=0x3587a8e0, ctx=..., stream_idx=0, session_scope=..., terminate_flag=@0x2716f308: false, continue_flag=@0x7fff94d9d51f: true)

at onnxruntime-1.18.0/onnxruntime/core/framework/execution_steps.cc:73

7 0x00007f02b5acb5a3 in onnxruntime::RunSince(unsigned long, onnxruntime::StreamExecutionContext&, onnxruntime::SessionScope&, bool const&, unsigned long) (stream_idx=0, ctx=..., session_scope=..., terminate_flag=@0x2716f308: false, since=0)

at onnxruntime-1.18.0/onnxruntime/core/framework/stream_execution_context.cc:222

8 0x00007f02b5ac827b in onnxruntime::<lambda()>::operator()(void) const (__closure=0x7efc017dc3b0) at onnxruntime-1.18.0/onnxruntime/core/framework/sequential_executor.cc:589

9 0x00007f02b5ac992f in std::_Function_handler<void(), onnxruntime::ExecuteThePlan(const onnxruntime::SessionState&, gsl::span, gsl::span, gsl::span, std::vector&, const std::unordered_map<long unsigned int, std::function<onnxruntime::common::Status(const onnxruntime::TensorShape&, const OrtDevice&, OrtValue&, bool&)> >&, const onnxruntime::logging::Logger&, const onnxruntime::DeviceStreamCollection*, bool const&, bool, bool)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297

10 0x00007f02b4e39dac in std::function<void ()>::operator()() const (this=0x7fff94d9dbf0) at /usr/include/c++/8/bits/std_function.h:687

11 0x00007f02b4e1ad49 in onnxruntime::concurrency::ThreadPool::Schedule(onnxruntime::concurrency::ThreadPool*, std::function<void ()>) (tp=0x0, fn=...) at onnxruntime-1.18.0/include/onnxruntime/core/platform/threadpool.h:233

12 0x00007f02b5ac8608 in onnxruntime::ExecuteThePlan(onnxruntime::SessionState const&, gsl::span<int const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<int const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollection const*, bool const&, bool, bool)

(session_state=..., feed_mlvalue_idxs=..., feeds=..., fetch_mlvalue_idxs=..., fetches=std::vector of length 2, capacity 2 = {...}, fetch_allocators=std::unordered_map with 0 elements, logger=..., device_streams=0x1dbb3080, terminate_flag=@0x2716f308: false, only_execute_path_to_fetches=false, single_thread_mode=true) at onnxruntime-1.18.0/onnxruntime/core/framework/sequential_executor.cc:588

13 0x00007f02b5a68157 in onnxruntime::utils::ExecuteGraphImpl(const onnxruntime::SessionState &, const onnxruntime::FeedsFetchesManager &, gsl::span<OrtValue const, 18446744073709551615>, std::vector<OrtValue, std::allocator > &, const std::unordered_map<long unsigned int, std::function<onnxruntime::common::Status(const onnxruntime::TensorShape&, const OrtDevice&, OrtValue&, bool&)>, std::hash, std::equal_to, std::allocator<std::pair<long unsigned int const, std::function<onnxruntime::common::Status(const onnxruntime::TensorShape&, const OrtDevice&, OrtValue&, bool&)> > > > &, ExecutionMode, const bool &, const onnxruntime::logging::Logger &, onnxruntime::DeviceStreamCollection , bool, onnxruntime::Stream )

(session_state=..., feeds_fetches_manager=..., feeds=..., fetches=std::vector of length 2, capacity 2 = {...}, fetch_allocators=std::unordered_map with 0 elements, execution_mode=ORT_SEQUENTIAL, terminate_flag=@0x2716f308: false, logger=..., device_stream_collection=0x1dbb3080, only_execute_path_to_fetches=false, parent_stream=0x0) at onnxruntime-1.18.0/onnxruntime/core/framework/utils.cc:706

14 0x00007f02b5a6878e in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollectionHolder&, bool, onnxruntime::Stream*)

(session_state=..., feeds_fetches_manager=..., feeds=..., fetches=std::vector of length 2, capacity 2 = {...}, execution_mode=ORT_SEQUENTIAL, terminate_flag=@0x2716f308: false, logger=..., device_stream_collection_holder=..., only_execute_path_to_fetches=false, parent_stream=0x0)
at onnxruntime-1.18.0/onnxruntime/core/framework/utils.cc:755

15 0x00007f02b5a68868 in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >&, ExecutionMode, OrtRunOptions const&, onnxruntime::DeviceStreamCollectionHolder&, onnxruntime::logging::Logger const&) (session_state=..., feeds_fetches_manager=..., feeds=..., fetches=std::vector of length 2, capacity 2 = {...}, execution_mode=ORT_SEQUENTIAL, run_options=..., device_stream_collection_holder=..., logger=...)

at onnxruntime-1.18.0/onnxruntime/core/framework/utils.cc:782

16 0x00007f02b4e33fd5 in onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >, std::vector<OrtDevice, std::allocator > const)

(this=0x23f71cf0, run_options=..., feed_names=..., feeds=..., output_names=..., p_fetches=0x7fff94d9f1f0, p_fetches_device_info=0x0) at onnxruntime-1.18.0/onnxruntime/core/session/inference_session.cc:2531

17 0x00007f02b4e351bc in onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<char const const, 18446744073709551615ul>, gsl::span<OrtValue const const, 18446744073709551615ul>, gsl::span<char const const, 18446744073709551615ul>, gsl::span<OrtValue, 18446744073709551615ul>)

(this=0x23f71cf0, run_options=..., feed_names=..., feeds=..., fetch_names=..., fetches=...) at onnxruntime-1.18.0/onnxruntime/core/session/inference_session.cc:2659

18 0x00007f02b4d42116 in OrtApis::Run(OrtSession, OrtRunOptions const, char const const, OrtValue const const, unsigned long, char const const, unsigned long, OrtValue**)

(sess=0x23f71cf0, run_options=0x2716f2e0, input_names=0x1b75aff0, input=0x7efc5550bba0, input_len=2, output_names=0x1dea9570, output_names_len=2, output=0x7efbf802c200) at onnxruntime-1.18.0/onnxruntime/core/session/onnxruntime_c_api.cc:831`

To reproduce

Run inference on a model that is too large to be cached (or force return of the following error "TensorRT EP failed to create engine from network." in the TensorRT EP. Try running the inference again on the same session. --> crash

Urgency

No response

Platform

Linux

OS Version

ROCKY 8.5 (gcc-11.2.1, c++17)

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

C

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

CUDA 11.8

yf711 commented 3 months ago

Could you try building ORT from this branch and see if this could stop from crashing?

frenetj commented 3 months ago

Hi Yifan,

Thanks for the quick fix; it works perfectly!

However, while compiling your branch with TensorRT 8.5.3, we got the following errors:

/git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc: In member function 'onnxruntime::common::Status onnxruntime::TensorrtExecutionProvider::CreateNodeComputeInfoFromGraph(const onnxruntime::GraphViewer&, const onnxruntime::Node&, std::unordered_map<std::__cxx11::basic_string<char>, long unsigned int>&, std::unordered_map<std::__cxx11::basic_string<char>, long unsigned int>&, std::vector<onnxruntime::NodeComputeInfo>&)': /git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:3055:17: error: 'class nvinfer1::IBuilderConfig' has no member named 'setHardwareCompatibilityLevel' 3055 | trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:3055:57: error: 'nvinfer1::HardwareCompatibilityLevel' has not been declared 3055 | trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ /git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc: In lambda function: /git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:3644:21: error: 'class nvinfer1::IBuilderConfig' has no member named 'setHardwareCompatibilityLevel' 3644 | trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:3644:61: error: 'nvinfer1::HardwareCompatibilityLevel' has not been declared 3644 | trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ gmake[2]: *** [CMakeFiles/onnxruntime_providers_tensorrt.dir/build.make:146: CMakeFiles/onnxruntime_providers_tensorrt.dir/git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc.o] Error 1 gmake[1]: *** [CMakeFiles/Makefile2:2267: CMakeFiles/onnxruntime_providers_tensorrt.dir/all] Error 2

that we fixed by adding #if NV_TENSORRT_MAJOR >= 10 when trt_config->setHardwareCompatibilityLevel was called:

diff --git a/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc b/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc index 2df4611743..b1e7147ea1 100644 --- a/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc +++ b/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc @@ -3051,12 +3051,13 @@ Status TensorrtExecutionProvider::CreateNodeComputeInfoFromGraph(const GraphView

std::string cache_hw_compat = "_sm" + computecapability; // Enable hardware compatility mode if assigned +#if NV_TENSORRT_MAJOR >= 10
if (engine_cacheenable && engine_hwcompatible) { trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS); cache_hw_compat = "_sm80+"; LOGS_DEFAULT(VERBOSE) << "[TensorRT EP] Hardware compatibility is enabled when loading and capturing engine cache."; }

+#endif // Name the engine cache based on GPU compute capacity and reduce the chance of loading an incompatible cache // Note: Engine cache generated on a GPU with large memory might not be loadable on a GPU with smaller memory, even if they share the same compute capacity const std::string cache_path_prefix = cache_path + cache_hw_compat; @@ -3639,12 +3640,13 @@ Status TensorrtExecutionProvider::CreateNodeComputeInfoFromGraph(const GraphView } }

+#if NV_TENSORRT_MAJOR >= 10 // Enable hardware compatility mode if assigned if (trt_state->engine_hw_compatible) { trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS); LOGS_DEFAULT(INFO) << "[TensorRT EP] Re-generate engine with hardware compatibility enabled."; }

+#endif // Build engine std::unique_ptr serialized_engine; {

Would it be possible for you to also make this change?

frenetj commented 3 months ago

Note that Git's formatting is not showing the second part of the above comment properly. Please read it in standard text format.

yf711 commented 3 months ago

Hi @frenetj ORT starts to support TRT8.6 since 1.15 and add features incompatible to older TRT 8.x. Please find TRT version requirement https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html We recommend using latest TRT10.x, as ORT will gradually stop supporting TRT8.6 in future

frenetj commented 2 months ago

Hello @yf711 Using TRT8.6 works perfectly with this fix. Thanks a lot!

frenetj commented 1 week ago

Hello @yf711, the fix doesn't seem to have been integrated in the latest release (1.19.2).

yf711 commented 1 week ago

Hi @frenetj thanks for the notice I just found that my fix didn't make it to 1.19, but it will be included in the upcoming 1.20 release, which is targeted early next month. You can also build from branch rel-1.20.0 and see if that works as expected in your case