microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.7k stars 2.93k forks source link

A bug occurs when the program terminates #15174

Open busishengui opened 1 year ago

busishengui commented 1 year ago

Describe the issue

It works well when it run in GPU,but it has a bug when it terminates terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException' what(): /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:122 bool onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char) [with ERRTYPE = cudaError; bool THRW = true] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 bool onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 4: driver shutting down ; GPU=806358777 ; hostname=lv-voice-rt-02 ; expr=cudaEventSynchronize(e); image

To reproduce

    auto env = std::make_shared<Ort::Env>(ORT_LOGGING_LEVEL_WARNING, "RNNT-model");
    auto session_options = std::make_shared< Ort::SessionOptions>();
    session_options->SetInterOpNumThreads(1);
    session_options->SetIntraOpNumThreads(1);
    session_options->DisableCpuMemArena();
    session_options->SetGraphOptimizationLevel(ORT_ENABLE_ALL); 
    auto options = std::make_shared<OrtCUDAProviderOptions>();
    options->device_id = 0; 
    options->arena_extend_strategy = 1;
    options->cudnn_conv_algo_search = OrtCudnnConvAlgoSearch::OrtCudnnConvAlgoSearchDefault;
    options->do_copy_in_default_stream = -1;
    options->default_memory_arena_cfg = nullptr;
    session_options->AppendExecutionProvider_CUDA(*options);

Urgency

No response

Platform

Linux

OS Version

centos

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.12.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.4

pranavsharma commented 1 year ago

Can you try the latest version of ORT? We've not seen reports of this behavior. So, detailed instructions on how to repro will be required.

satyajandhyala commented 1 year ago

Using the code from the latest main today, I could not reproduce this issue.

Cryolitia commented 1 year ago

seems similar with #2804 #10352

Cryolitia commented 1 year ago

Hello, I'm a member of MaaAssistantArknights, and it occurs on our program as the same.

Onnxruntime version: 1.15.1 with prebuild https://github.com/microsoft/onnxruntime/releases/download/v1.15.1/onnxruntime-linux-x64-gpu-1.15.1.tgz

Exception:

terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException'
  what():  /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 4: driver shutting down ; GPU=2000772548 ; hostname=Cryolitia-nixos ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_allocator.cc ; line=99 ; expr=cudaFreeHost(p); 

core dump:

                #0  0x00007f31a856fd7c __pthread_kill_implementation (libc.so.6 + 0x8cd7c)
                #1  0x00007f31a85209c6 raise (libc.so.6 + 0x3d9c6)
                #2  0x00007f31a85098fa abort (libc.so.6 + 0x268fa)
                #3  0x00007f31a56a9a89 _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold (libstdc++.so.6 + 0xa9a89)
                #4  0x00007f31a56b4f8a _ZN10__cxxabiv111__terminateEPFvvE (libstdc++.so.6 + 0xb4f8a)
                #5  0x00007f31a56b3ff9 __cxa_call_terminate (libstdc++.so.6 + 0xb3ff9)
                #6  0x00007f31a56b4716 __gxx_personality_v0 (libstdc++.so.6 + 0xb4716)
                #7  0x00007f31a87c2864 _Unwind_RaiseException_Phase2 (libgcc_s.so.1 + 0x17864)
                #8  0x00007f31a87c32bd _Unwind_Resume (libgcc_s.so.1 + 0x182bd)
                #9  0x00007f31134e1364 _ZN11onnxruntime8CudaCallI9cudaErrorLb1EEENSt11conditionalIXT0_EvNS_6common6StatusEE4typeET_PKcS9_S7_S9_S9_i (libonnxruntime_providers_cuda.so + 0xe1364)
                #10 0x00007f31134dd91b _ZN11onnxruntime19CUDAPinnedAllocator4FreeEPv (libonnxruntime_providers_cuda.so + 0xdd91b)
                #11 0x00007f31a7172d7d n/a (libonnxruntime.so.1.15.1 + 0x972d7d)
                #12 0x00007f31a7172f3d n/a (libonnxruntime.so.1.15.1 + 0x972f3d)
                #13 0x00007f31134eebe2 _ZN11onnxruntime21CUDAExecutionProviderD1Ev (libonnxruntime_providers_cuda.so + 0xeebe2)
                #14 0x00007f31134eed1d _ZN11onnxruntime21CUDAExecutionProviderD0Ev (libonnxruntime_providers_cuda.so + 0xeed1d)
                #15 0x00007f31a6a72b8a n/a (libonnxruntime.so.1.15.1 + 0x272b8a)
                #16 0x00007f31a6a72d7d n/a (libonnxruntime.so.1.15.1 + 0x272d7d)
                #17 0x00007f31a7b31ddd _ZN10fastdeploy10OrtBackendD1Ev (libMaaDerpLearning.so + 0x131ddd)
                #18 0x00007f31a7b31e69 _ZN10fastdeploy10OrtBackendD0Ev (libMaaDerpLearning.so + 0x131e69)
                #19 0x00007f31a7b27105 _ZN10fastdeploy7RuntimeD2Ev (libMaaDerpLearning.so + 0x127105)
                #20 0x00007f31a7b273d2 _ZNSt15_Sp_counted_ptrIPN10fastdeploy7RuntimeELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv (libMaaDerpLearning.so + 0x1273d2)
                #21 0x00007f31a8188859 _ZN10fastdeploy15FastDeployModelD1Ev (libMaaCore.so + 0x188859)

For more technical details:

  1. we use fastdeploy_ppocr in https://github.com/MaaAssistantArknights/MaaAssistantArknights/blob/0ae92d0de5f83a231d906f8e18ad99764ebab67e/src/MaaCore/Config/Miscellaneous/OcrPack.cpp#L124 , create two instances of fastdeploy::Runtime.
  2. Each fastdeploy::Runtime creates a Ort::Session in https://github.com/MaaAssistantArknights/FastDeploy/blob/master/fastdeploy/backends/ort/ort_backend.cc
  3. When the program exits 0 normally, occurs driver shutting down

Could it be caused by that, each Ort::Session instance owns a instance of cuda driver but the cuda driver was shut down globally when the first instance destructed, and the second instance tries to shut down a already-shut-down cuda driver.

airstillblue commented 1 year ago

Meet the same problem. Program ends with:

terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException' what(): /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:122 bool onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char) [with ERRTYPE = cudaError; bool THRW = true] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 bool onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 4: driver shutting down ; GPU=-2130784471 ; hostname=dev-audioaihcb1 ; expr=cudaEventSynchronize(e);

onnxruntime version is onnxruntime-linux-x64-gpu-1.12.0

LLsmile commented 12 months ago

onnxruntime-linux-x64-gpu-1.16.3 meets the same problem.

horror-proton commented 8 months ago

Debugging with breakpoints on cudaFreeHost and cudaMallocHost

Long text ``` (gdb) run Starting program: /usr/bin/maa run main [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/libthread_db.so.1". [2024-02-21 11:27:57 WARN ] Hot update resource directory not found! [New Thread 0x7fff58e006c0 (LWP 22641)] [New Thread 0x7fff584006c0 (LWP 22642)] [Thread 0x7fff584006c0 (LWP 22642) exited] [New Thread 0x7fff57a006c0 (LWP 22643)] [New Thread 0x7fff570006c0 (LWP 22644)] [New Thread 0x7fff566006c0 (LWP 22645)] [Detaching after fork from child process 22646] [Detaching after fork from child process 22651] [Detaching after fork from child process 22658] [Detaching after fork from child process 22666] [Detaching after fork from child process 22667] [Detaching after fork from child process 22669] [Detaching after fork from child process 22673] [Detaching after fork from child process 22692] [Detaching after fork from child process 22702] [New Thread 0x7fff4fe006c0 (LWP 22712)] [New Thread 0x7fff4f8006c0 (LWP 22713)] [New Thread 0x7fff4f2006c0 (LWP 22714)] [New Thread 0x7fff4e6006c0 (LWP 22716)] [New Thread 0x7fff4ec006c0 (LWP 22715)] [New Thread 0x7fff4e0006c0 (LWP 22717)] [New Thread 0x7fff4da006c0 (LWP 22718)] [New Thread 0x7fff4ce006c0 (LWP 22720)] [New Thread 0x7fff4d4006c0 (LWP 22719)] [New Thread 0x7fff47e006c0 (LWP 22721)] [New Thread 0x7fff4c8006c0 (LWP 22722)] [Detaching after fork from child process 22723] [Detaching after fork from child process 22733] [Detaching after fork from child process 22758] [Detaching after fork from child process 22769] [New Thread 0x7fff556006c0 (LWP 22780)] [New Thread 0x7fff54c006c0 (LWP 22784)] [New Thread 0x7fff3ec006c0 (LWP 22785)] [New Thread 0x7fff3e2006c0 (LWP 22786)] [New Thread 0x7fff3d8006c0 (LWP 22787)] [New Thread 0x7fff3ce006c0 (LWP 22788)] [New Thread 0x7fff37e006c0 (LWP 22789)] [New Thread 0x7fff374006c0 (LWP 22790)] [New Thread 0x7fff36a006c0 (LWP 22791)] [New Thread 0x7fff360006c0 (LWP 22792)] [New Thread 0x7fff356006c0 (LWP 22793)] [New Thread 0x7fff34c006c0 (LWP 22794)] [New Thread 0x7fff2fe006c0 (LWP 22795)] [New Thread 0x7fff2f4006c0 (LWP 22796)] [Switching to Thread 0x7fff566006c0 (LWP 22645)] Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=0x7fff4292ebc0, num_bytes=24, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=, size=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272 #3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor (this=this@entry=0x7fff434252d0, p_type=p_type@entry=0x7ffff5b343a0 ::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr (use count 3, weak count 0) = {...}, strides=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72 $73 = {_vptr.IAllocator = 0x7ffee6133268 , memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff429315e0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff429315e0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=0x7fff429315e0, num_bytes=24, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=, size=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272 #3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor (this=this@entry=0x7fff43416c60, p_type=0x7ffff5b343a0 ::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr (use count 3, weak count 0) = {...}, strides=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72 $74 = {_vptr.IAllocator = 0x7ffff5ac1b58 , memory_info_ = {name = 0x7ffff5666be3 "Cpu", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 0, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=75264) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=75264) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=0x7fff4292ebc0, num_bytes=75264, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=, size=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272 #3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor (this=this@entry=0x7fff42878fc0, p_type=p_type@entry=0x7ffff5b343a0 ::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr (use count 58, weak count 0) = {...}, strides=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72 $75 = {_vptr.IAllocator = 0x7ffee6133268 , memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}} [New Thread 0x7fff272006c0 (LWP 22797)] [New Thread 0x7fff268006c0 (LWP 22798)] [New Thread 0x7fff25e006c0 (LWP 22799)] [New Thread 0x7fff254006c0 (LWP 22800)] [New Thread 0x7fff24a006c0 (LWP 22801)] [New Thread 0x7fff1fe006c0 (LWP 22802)] [New Thread 0x7fff1f4006c0 (LWP 22803)] [New Thread 0x7fff1ea006c0 (LWP 22804)] [New Thread 0x7fff1e0006c0 (LWP 22805)] [New Thread 0x7fff1d6006c0 (LWP 22806)] [New Thread 0x7fff1cc006c0 (LWP 22807)] 2024-02-21 11:28:14.625151011 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2024-02-21 11:28:14.625177901 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=0x7fff42f64d10, num_bytes=4, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=, size=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272 #3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor (this=this@entry=0x7fff42f63b50, p_type=p_type@entry=0x7ffff5b343a0 ::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr (use count 3, weak count 0) = {...}, strides=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72 $76 = {_vptr.IAllocator = 0x7ffee6133268 , memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=0x7fff42f658d0, num_bytes=4, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=, size=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272 #3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor (this=this@entry=0x7fff42e4f2c0, p_type=0x7ffff5b343a0 ::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr (use count 3, weak count 0) = {...}, strides=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72 $77 = {_vptr.IAllocator = 0x7ffff5ac1b58 , memory_info_ = {name = 0x7ffff5666be3 "Cpu", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 0, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=1179648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=1179648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=0x7fff42f64d10, num_bytes=1179648, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=, size=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272 #3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor (this=this@entry=0x7fff42d447f0, p_type=p_type@entry=0x7ffff5b343a0 ::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr (use count 19, weak count 0) = {...}, strides=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72 $78 = {_vptr.IAllocator = 0x7ffee6133268 , memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=1179648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=1179648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=0x7fff42f658d0, num_bytes=1179648, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=, size=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272 #3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor (this=this@entry=0x7ffd785f1a30, p_type=0x7ffff5b343a0 ::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr (use count 5, weak count 0) = {...}, strides=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72 $79 = {_vptr.IAllocator = 0x7ffff5ac1b58 , memory_info_ = {name = 0x7ffff5666be3 "Cpu", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 0, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=1048576) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=1048576) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=0x7fff42f64d10, num_bytes=1048576, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=, size=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272 #3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor (this=this@entry=0x7fff42dae600, p_type=p_type@entry=0x7ffff5b343a0 ::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr (use count 36, weak count 0) = {...}, strides=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72 $80 = {_vptr.IAllocator = 0x7ffee6133268 , memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=2359296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=2359296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=0x7fff42f64d10, num_bytes=2359296, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=, size=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272 #3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor (this=this@entry=0x7ffd785296b0, p_type=p_type@entry=0x7ffff5b343a0 ::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr (use count 58, weak count 0) = {...}, strides=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72 $81 = {_vptr.IAllocator = 0x7ffee6133268 , memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=2359296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=2359296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=0x7fff42f658d0, num_bytes=2359296, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=, size=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272 #3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor (this=this@entry=0x7ffd78529b70, p_type=0x7ffff5b343a0 ::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr (use count 7, weak count 0) = {...}, strides=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72 $82 = {_vptr.IAllocator = 0x7ffff5ac1b58 , memory_info_ = {name = 0x7ffff5666be3 "Cpu", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 0, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=2715648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=2715648) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=this@entry=0x7fff4292ebc0, num_bytes=num_bytes@entry=2715648, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7fff40774250, enable_cross_stream_reusing=, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function) (this=this@entry=0x7fff4292ebc0, size=2715648, current_stream=current_stream@entry=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871 #3 0x00007ffff5268eb5 in onnxruntime::utils::AllocateHelper (target_mlvalue=..., source_mlvalue=..., target_stream=0x7fff40774250, allocator=std::shared_ptr (use count 150, weak count 0) = {...}) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/utils.cc:91 $83 = {_vptr.IAllocator = 0x7ffee6133268 , memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=1810432) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=1810432) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=this@entry=0x7fff4292ebc0, num_bytes=num_bytes@entry=1810432, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7fff40774250, enable_cross_stream_reusing=, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function) (this=this@entry=0x7fff4292ebc0, size=size@entry=1810432, current_stream=current_stream@entry=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871 #3 0x00007ffff51dcddf in onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper (this=this@entry=0x7fff565fd088, ort_value=..., ort_value_index=ort_value_index@entry=3, element_type=0x7ffff5b343a0 ::Type()::prim_data_type>, location=..., shape=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/execution_frame.cc:587 $84 = {_vptr.IAllocator = 0x7ffee6133268 , memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=7241728) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=7241728) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=this@entry=0x7fff4292ebc0, num_bytes=num_bytes@entry=7241728, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7fff40774250, enable_cross_stream_reusing=, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function) (this=this@entry=0x7fff4292ebc0, size=size@entry=7241728, current_stream=current_stream@entry=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871 #3 0x00007ffff51dcddf in onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper (this=this@entry=0x7fff565fd088, ort_value=..., ort_value_index=ort_value_index@entry=328, element_type=0x7ffff5b343a0 ::Type()::prim_data_type>, location=..., shape=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/execution_frame.cc:587 $85 = {_vptr.IAllocator = 0x7ffee6133268 , memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=5431296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=5431296) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=this@entry=0x7fff4292ebc0, num_bytes=num_bytes@entry=5431296, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7fff40774250, enable_cross_stream_reusing=, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function) (this=this@entry=0x7fff4292ebc0, size=size@entry=5431296, current_stream=current_stream@entry=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871 #3 0x00007ffff51dcddf in onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper (this=this@entry=0x7fff565fd088, ort_value=..., ort_value_index=ort_value_index@entry=244, element_type=0x7ffff5b343a0 ::Type()::prim_data_type>, location=..., shape=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/execution_frame.cc:587 $86 = {_vptr.IAllocator = 0x7ffee6133268 , memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42931000, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42931000, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=0x7fff42931000, num_bytes=32, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=, size=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272 #3 0x00007ffff4a0afe8 in onnxruntime::ProviderHostImpl::Allocator__AllocateBufferWithOptions(onnxruntime::IAllocator&, unsigned long, bool, onnxruntime::Stream*, std::function) (this=, allocator=..., size=32, use_reserve=, stream=0x0, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/session/provider_bridge_ort.cc:1057 $87 = {_vptr.IAllocator = 0x7ffee61332e8 , memory_info_ = {name = 0x7ffeb4035591 "CudaPinned", id = 0, mem_type = OrtMemTypeCPUOutput, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 1, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 5.2, 0x00007fffee456294 in cudaMallocHost () from /opt/cuda/lib64/libcudart.so.12 (gdb) fin Run till exit from #0 0x00007fffee456294 in cudaMallocHost () from /opt/cuda/lib64/libcudart.so.12 0x00007ffeb34b3a89 in onnxruntime::CUDAPinnedAllocator::Alloc (this=, size=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/providers/cuda/cuda_allocator.cc:92 92 CUDA_CALL_THROW(cudaMallocHost((void**)&p, size)); (gdb) print p $88 = (void *) 0x7ffddca00600 (gdb) continue Continuing. Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=33554432) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff4292ebc0, rounded_bytes=rounded_bytes@entry=33554432) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=this@entry=0x7fff4292ebc0, num_bytes=num_bytes@entry=33554432, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7fff40774250, enable_cross_stream_reusing=, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function) (this=this@entry=0x7fff4292ebc0, size=size@entry=33554432, current_stream=current_stream@entry=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871 #3 0x00007ffff51b2bc4 in onnxruntime::AllocateBufferWithOptions(onnxruntime::IAllocator&, unsigned long, bool, onnxruntime::Stream*, std::function) (alloc=..., size=33554432, use_reserve=, stream=0x7fff40774250, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/allocator.cc:121 $89 = {_vptr.IAllocator = 0x7ffee6133268 , memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=24320) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=24320) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=this@entry=0x7fff42f64d10, num_bytes=num_bytes@entry=24192, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7ffd07161090, enable_cross_stream_reusing=, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function) (this=this@entry=0x7fff42f64d10, size=size@entry=24192, current_stream=current_stream@entry=0x7ffd07161090, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871 #3 0x00007ffff51b2bc4 in onnxruntime::AllocateBufferWithOptions(onnxruntime::IAllocator&, unsigned long, bool, onnxruntime::Stream*, std::function) (alloc=..., size=24192, use_reserve=, stream=0x7ffd07161090, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/allocator.cc:121 $90 = {_vptr.IAllocator = 0x7ffee6133268 , memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=5898240) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=5898240) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=this@entry=0x7fff42f64d10, num_bytes=num_bytes@entry=5898240, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7ffd07161090, enable_cross_stream_reusing=, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function) (this=this@entry=0x7fff42f64d10, size=size@entry=5898240, current_stream=current_stream@entry=0x7ffd07161090, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871 #3 0x00007ffff51dcddf in onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper (this=this@entry=0x7fff565fd088, ort_value=..., ort_value_index=ort_value_index@entry=35, element_type=0x7ffff5b343a0 ::Type()::prim_data_type>, location=..., shape=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/execution_frame.cc:587 $91 = {_vptr.IAllocator = 0x7ffee6133268 , memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f652f0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f652f0, rounded_bytes=rounded_bytes@entry=256) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=0x7fff42f652f0, num_bytes=16, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=, size=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272 #3 0x00007ffff4a0afe8 in onnxruntime::ProviderHostImpl::Allocator__AllocateBufferWithOptions(onnxruntime::IAllocator&, unsigned long, bool, onnxruntime::Stream*, std::function) (this=, allocator=..., size=16, use_reserve=, stream=0x0, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/session/provider_bridge_ort.cc:1057 $92 = {_vptr.IAllocator = 0x7ffee61332e8 , memory_info_ = {name = 0x7ffeb4035591 "CudaPinned", id = 0, mem_type = OrtMemTypeCPUOutput, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 1, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 5.2, 0x00007fffee456294 in cudaMallocHost () from /opt/cuda/lib64/libcudart.so.12 (gdb) fin Run till exit from #0 0x00007fffee456294 in cudaMallocHost () from /opt/cuda/lib64/libcudart.so.12 0x00007ffeb34b3a89 in onnxruntime::CUDAPinnedAllocator::Alloc (this=, size=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/providers/cuda/cuda_allocator.cc:92 92 CUDA_CALL_THROW(cudaMallocHost((void**)&p, size)); (gdb) print p $93 = (void *) 0x7ffd19200000 (gdb) continue Continuing. Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=34504704) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f64d10, rounded_bytes=rounded_bytes@entry=34504704) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=this@entry=0x7fff42f64d10, num_bytes=num_bytes@entry=34504704, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x7ffd07161090, enable_cross_stream_reusing=, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc7ea in onnxruntime::StreamAwareArena::AllocOnStream(unsigned long, onnxruntime::Stream*, std::function) (this=this@entry=0x7fff42f64d10, size=size@entry=34504704, current_stream=current_stream@entry=0x7ffd07161090, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:871 #3 0x00007ffff51b2bc4 in onnxruntime::AllocateBufferWithOptions(onnxruntime::IAllocator&, unsigned long, bool, onnxruntime::Stream*, std::function) (alloc=..., size=34504704, use_reserve=, stream=0x7ffd07161090, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/allocator.cc:121 $94 = {_vptr.IAllocator = 0x7ffee6133268 , memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}} Thread 6 "maa working" hit Breakpoint 2, onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=6626048) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 99 Status BFCArena::Extend(size_t rounded_bytes) { #0 onnxruntime::BFCArena::Extend (this=this@entry=0x7fff42f658d0, rounded_bytes=rounded_bytes@entry=6626048) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:99 #1 0x00007ffff51bbfea in onnxruntime::BFCArena::AllocateRawInternal(unsigned long, bool, onnxruntime::Stream*, bool, std::function) (this=0x7fff42f658d0, num_bytes=6625920, dump_log_on_failure=dump_log_on_failure@entry=false, stream=stream@entry=0x0, enable_cross_stream_reusing=enable_cross_stream_reusing@entry=false, wait_fn=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:351 #2 0x00007ffff51bc718 in onnxruntime::BFCArena::Alloc (this=, size=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:272 #3 0x00007ffff524f782 in onnxruntime::Tensor::Tensor (this=0x7ffd1db128d0, p_type=0x7ffff5b343a0 ::Type()::prim_data_type>, shape=..., allocator=std::shared_ptr (use count 10, weak count 0) = {...}, strides=...) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/tensor.cc:72 $95 = {_vptr.IAllocator = 0x7ffff5ac1b58 , memory_info_ = {name = 0x7ffff5666be3 "Cpu", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 0, device_id = 0}}} [Detaching after fork from child process 22814] [Detaching after fork from child process 22825] [Detaching after fork from child process 22836] [Detaching after fork from child process 22846] [Detaching after fork from child process 22857] ^C Thread 1 "maa" received signal SIGINT, Interrupt. [Switching to Thread 0x7ffff7423980 (LWP 22640)] 0x00007ffff7606335 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7fffffffcfa0, rem=0x7fffffffcfa0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48 48 r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep_time64, clock_id, flags, req, (gdb) signal SIGINT Continuing with signal SIGINT. [Detaching after fork from child process 22870] [Thread 0x7fff58e006c0 (LWP 22641) exited] [Thread 0x7fff570006c0 (LWP 22644) exited] [Thread 0x7fff57a006c0 (LWP 22643) exited] [Thread 0x7fff566006c0 (LWP 22645) exited] Summary ---------------------------------------- [StartUp] 11:28:05 - 11:29:04 (58s) Completed ---------------------------------------- [Infrast] 11:29:04 - Unfinished ---------------------------------------- [Recruit] Unstarted ---------------------------------------- [Mall] Unstarted ---------------------------------------- [Award] Unstarted Error: Interrupted by user! [Thread 0x7fff4e6006c0 (LWP 22716) exited] [Thread 0x7fff4c8006c0 (LWP 22722) exited] [Thread 0x7fff4f8006c0 (LWP 22713) exited] [Thread 0x7fff47e006c0 (LWP 22721) exited] [Thread 0x7fff4d4006c0 (LWP 22719) exited] [Thread 0x7fff4e0006c0 (LWP 22717) exited] [Thread 0x7fff4f2006c0 (LWP 22714) exited] [Thread 0x7fff4fe006c0 (LWP 22712) exited] [Thread 0x7fff4da006c0 (LWP 22718) exited] [Thread 0x7fff4ce006c0 (LWP 22720) exited] [Thread 0x7fff4ec006c0 (LWP 22715) exited] [Thread 0x7fff24a006c0 (LWP 22801) exited] [Thread 0x7fff254006c0 (LWP 22800) exited] [Thread 0x7fff25e006c0 (LWP 22799) exited] [Thread 0x7fff268006c0 (LWP 22798) exited] [Thread 0x7fff272006c0 (LWP 22797) exited] [Thread 0x7fff1cc006c0 (LWP 22807) exited] [Thread 0x7fff1d6006c0 (LWP 22806) exited] [Thread 0x7fff1e0006c0 (LWP 22805) exited] [Thread 0x7fff1ea006c0 (LWP 22804) exited] [Thread 0x7fff1f4006c0 (LWP 22803) exited] [Thread 0x7fff1fe006c0 (LWP 22802) exited] Thread 1 "maa" hit Breakpoint 3.1, onnxruntime::BFCArena::~BFCArena (this=this@entry=0x7fff42f64d10, __in_chrg=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80 80 BFCArena::~BFCArena() { #0 onnxruntime::BFCArena::~BFCArena (this=this@entry=0x7fff42f64d10, __in_chrg=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80 #1 0x00007ffff51b403c in onnxruntime::StreamAwareArena::~StreamAwareArena (this=0x7fff42f64d10, __in_chrg=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.h:520 #2 onnxruntime::StreamAwareArena::~StreamAwareArena (this=0x7fff42f64d10, __in_chrg=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.h:520 #3 0x00007ffff49ecf17 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42db59d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:346 $96 = {_vptr.IAllocator = 0x7ffee6133268 , memory_info_ = {name = 0x7ffeb403559c "Cuda", id = 0, mem_type = OrtMemTypeDefault, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 1, memory_type = 0, device_id = 0}}} Thread 1 "maa" hit Breakpoint 3.2, onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80 80 BFCArena::~BFCArena() { #0 onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80 #1 0x00007ffff49ecf17 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:346 #2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:317 #3 0x00007ffff4a5f48e in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fff42d8b380, __in_chrg=) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:1071 $97 = {_vptr.IAllocator = 0x7ffee61332e8 , memory_info_ = {name = 0x7ffeb4035591 "CudaPinned", id = 0, mem_type = OrtMemTypeCPUOutput, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 1, device_id = 0}}} Thread 1 "maa" hit Breakpoint 3.1, onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80 80 BFCArena::~BFCArena() { #0 onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:80 #1 0x00007ffff51b3fee in onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:92 #2 0x00007ffff49ecf17 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:346 #3 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:317 $98 = {_vptr.IAllocator = 0x7ffee61332e8 , memory_info_ = {name = 0x7ffeb4035591 "CudaPinned", id = 0, mem_type = OrtMemTypeCPUOutput, alloc_type = OrtDeviceAllocator, device = {static CPU = 0 '\000', static GPU = 1 '\001', static FPGA = 2 '\002', static NPU = 3 '\003', device_type = 0, memory_type = 1, device_id = 0}}} Thread 1 "maa" hit Breakpoint 6, 0x00007fffee456a84 in cudaFreeHost () from /opt/cuda/lib64/libcudart.so.12 (gdb) bt 3 #0 0x00007fffee456a84 in cudaFreeHost () at /opt/cuda/lib64/libcudart.so.12 #1 0x00007ffeb34b3ae1 in onnxruntime::CUDAPinnedAllocator::Free (this=, p=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/providers/cuda/cuda_allocator.cc:98 #2 0x00007ffff51b3e3d in onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:82 (More stack frames follow...) (gdb) info args No symbol table info available. (gdb) info registers rax 0x7ffee61332e8 140732758438632 rbx 0x7ffd1e5e0a98 140725112933016 rcx 0x0 0 rdx 0x100000001 4294967297 rsi 0x7ffd19200000 140725024980992 rdi 0x7ffd19200000 140725024980992 rbp 0x7fffffffd990 0x7fffffffd990 rsp 0x7fffffffd990 0x7fffffffd990 r8 0x7fff42d8b 34358963595 r9 0x7 7 r10 0x7fff42d8b2d0 140734314885840 r11 0x8e177c1ee1a1c7b3 -8207955323783100493 r12 0x7fff42f652f0 140734316827376 r13 0x7fff42d9c187 140734314955143 r14 0x1ff 511 r15 0x7ffd1d048770 140725090289520 rip 0x7fffee456a84 0x7fffee456a84 eflags 0x206 [ PF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 fs_base 0x7ffff7423980 140737341700480 gs_base 0x0 0 (gdb) continue Continuing. Thread 1 "maa" hit Catchpoint 4 (exception thrown), 0x00007ffff18b03b1 in __cxxabiv1::__cxa_throw (obj=0x555556457650, tinfo=0x7ffee6131eb8 , dest=0x7ffeb34a0cb0 ) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_throw.cc:81 81 PROBE2 (throw, obj, tinfo); (gdb) bt #0 0x00007ffff18b03b1 in __cxxabiv1::__cxa_throw (obj=0x555556457650, tinfo=0x7ffee6131eb8 , dest=0x7ffeb34a0cb0 ) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_throw.cc:81 #1 0x00007ffeb34b60f4 in onnxruntime::CudaCall (retCode=, exprString=exprString@entry=0x7ffeb403524f "cudaFreeHost(p)", libName=libName@entry=0x7ffeb4035141 "CUDA", successCode=successCode@entry=cudaSuccess, msg=msg@entry=0x7ffeb40350fd "", file=file@entry=0x7ffeb403a730 "/usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/providers/cuda/cuda_allocator.cc", line=98) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/new_allocator.h:90 #2 0x00007ffeb34b3b0d in onnxruntime::CUDAPinnedAllocator::Free (this=, p=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/providers/cuda/cuda_allocator.cc:98 #3 0x00007ffff51b3e3d in onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:82 #4 0x00007ffff51b3fee in onnxruntime::BFCArena::~BFCArena (this=0x7fff42f652f0, __in_chrg=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/bfc_arena.cc:92 #5 0x00007ffff49ecf17 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:346 #6 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42d875d0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:317 #7 0x00007ffff4a5f48e in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fff42d8b380, __in_chrg=) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:1071 #8 std::__shared_ptr::~__shared_ptr (this=0x7fff42d8b378, __in_chrg=) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr_base.h:1524 #9 std::shared_ptr::~shared_ptr (this=0x7fff42d8b378, __in_chrg=) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/shared_ptr.h:175 #10 std::pair >::~pair (this=0x7fff42d8b370, __in_chrg=) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_pair.h:185 #11 std::__new_allocator > > >::destroy > > (__p=0x7fff42d8b370, this=) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/new_allocator.h:181 #12 std::allocator_traits > > > >::destroy > > (__p=0x7fff42d8b370, __a=) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/alloc_traits.h:535 #13 std::_Rb_tree >, std::_Select1st > >, std::less, std::allocator > > >::_M_destroy_node (__p=0x7fff42d8b350, this=) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_tree.h:625 #14 std::_Rb_tree >, std::_Select1st > >, std::less, std::allocator > > >::_M_drop_node (this=, __p=0x7fff42d8b350) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_tree.h:633 #15 std::_Rb_tree >, std::_Select1st > >, std::less, std::allocator > > >::_M_erase (__x=0x7fff42d8b350, this=0x7fff42d8b250) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_tree.h:1939 #16 0x00007ffff4a7ea5a in std::_Rb_tree >, std::_Select1st > >, std::less, std::allocator > > >::~_Rb_tree (this=0x7fff42d8b250, __in_chrg=) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_tree.h:736 #17 std::map, std::less, std::allocator > > >::~map (this=0x7fff42d8b250, __in_chrg=) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/stl_map.h:312 #18 std::default_delete, std::less, std::allocator > > > >::operator() (this=, __ptr=0x7fff42d8b250) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:95 #19 std::default_delete, std::less, std::allocator > > > >::operator() (__ptr=0x7fff42d8b250, this=) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:89 #20 std::unique_ptr, std::less, std::allocator > > >, std::default_delete, std::less, std::allocator > > > > >::~unique_ptr (this=, __in_chrg=) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:396 #21 onnxruntime::SessionState::~SessionState (this=0x7fff42f64610, __in_chrg=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/framework/session_state.h:109 #22 0x00007ffff4a81a91 in std::default_delete::operator() (this=, __ptr=0x7fff42f64610) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:89 #23 std::default_delete::operator() (__ptr=0x7fff42f64610, this=) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:89 #24 std::unique_ptr >::~unique_ptr (this=0x7fff42da9138, __in_chrg=) at /usr/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/c++/bits/unique_ptr.h:396 #25 onnxruntime::InferenceSession::~InferenceSession (this=0x7fff42da8ae0, __in_chrg=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/session/inference_session.cc:530 #26 0x00007ffff4a81dee in onnxruntime::InferenceSession::~InferenceSession (this=0x7fff42da8ae0, __in_chrg=) at /usr/src/debug/onnxruntime/onnxruntime-opt-cuda/onnxruntime/core/session/inference_session.cc:530 #27 0x00007ffff5d3c568 in Ort::detail::OrtRelease (ptr=) at /usr/include/onnxruntime/onnxruntime_cxx_api.h:124 #28 Ort::detail::Base::~Base (this=0x7fff42a1c6c8, __in_chrg=) at /usr/include/onnxruntime/onnxruntime_cxx_api.h:561 #29 Ort::detail::ConstSessionImpl::~ConstSessionImpl (this=0x7fff42a1c6c8, __in_chrg=) at /usr/include/onnxruntime/onnxruntime_cxx_api.h:994 #30 Ort::detail::SessionImpl::~SessionImpl (this=0x7fff42a1c6c8, __in_chrg=) at /usr/include/onnxruntime/onnxruntime_cxx_api.h:1038 #31 Ort::Session::~Session (this=0x7fff42a1c6c8, __in_chrg=) at /usr/include/onnxruntime/onnxruntime_cxx_api.h:1109 #32 fastdeploy::OrtBackend::~OrtBackend (this=0x7fff42a1c6b0, __in_chrg=) at /usr/src/debug/maa-assistant-arknights/FastDeploy-d0b018ac6c3daa22c7b55b555dc927a5c734d430/fastdeploy/backends/ort/ort_backend.h:57 #33 0x00007ffff5d3c5fe in fastdeploy::OrtBackend::~OrtBackend (this=0x7fff42a1c6b0, __in_chrg=) at /usr/src/debug/maa-assistant-arknights/FastDeploy-d0b018ac6c3daa22c7b55b555dc927a5c734d430/fastdeploy/backends/ort/ort_backend.h:57 #34 0x00007ffff6b97526 in std::default_delete::operator() (this=0x7fff42a1ec78, __ptr=0x7fff42a1c6b0) at /usr/include/c++/13.2.1/bits/unique_ptr.h:99 #35 0x00007ffff6b95542 in std::unique_ptr >::~unique_ptr (this=0x7fff42a1ec78, __in_chrg=) at /usr/include/c++/13.2.1/bits/unique_ptr.h:404 #36 0x00007ffff6b979aa in fastdeploy::Runtime::~Runtime (this=0x7fff42a1e950, __in_chrg=) at /home/arch/projects/MaaAssistantArknights/usr/include/fastdeploy/runtime.h:458 #37 0x00007ffff5d2e157 in std::_Sp_counted_ptr::_M_dispose (this=) at /usr/include/c++/13.2.1/bits/shared_ptr_base.h:428 #38 0x00007ffff6ae90b1 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff42925d80) at /usr/include/c++/13.2.1/bits/shared_ptr_base.h:346 #39 0x00007ffff6af0897 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fff42da8a40, __in_chrg=) at /usr/include/c++/13.2.1/bits/shared_ptr_base.h:1071 #40 0x00007ffff6b94460 in std::__shared_ptr::~__shared_ptr (this=0x7fff42da8a38, __in_chrg=) at /usr/include/c++/13.2.1/bits/shared_ptr_base.h:1524 #41 0x00007ffff6b9447c in std::shared_ptr::~shared_ptr (this=0x7fff42da8a38, __in_chrg=) at /usr/include/c++/13.2.1/bits/shared_ptr.h:175 --Type for more, q to quit, c to continue without paging-- #42 0x00007ffff6b944c2 in fastdeploy::FastDeployModel::~FastDeployModel (this=0x7fff42da8640, __in_chrg=) at /home/arch/projects/MaaAssistantArknights/usr/include/fastdeploy/fastdeploy_model.h:21 #43 0x00007ffff6b97eee in fastdeploy::vision::ocr::Recognizer::~Recognizer (this=0x7fff42da8640, __in_chrg=) at /home/arch/projects/MaaAssistantArknights/usr/include/fastdeploy/vision/ocr/ppocr/recognizer.h:31 #44 0x00007ffff6b97f14 in std::default_delete::operator() (this=0x7ffff7229398 ::get_instance()::unique_instance+24>, __ptr=0x7fff42da8640) at /usr/include/c++/13.2.1/bits/unique_ptr.h:99 #45 0x00007ffff6b95fcc in std::unique_ptr >::~unique_ptr (this=0x7ffff7229398 ::get_instance()::unique_instance+24>, __in_chrg=) at /usr/include/c++/13.2.1/bits/unique_ptr.h:404 #46 0x00007ffff6b9183a in asst::OcrPack::~OcrPack (this=0x7ffff7229388 ::get_instance()::unique_instance+8>, __in_chrg=) at /home/arch/projects/MaaAssistantArknights/src/MaaCore/Config/Miscellaneous/OcrPack.cpp:27 #47 0x00007ffff6af42af in asst::WordOcr::~WordOcr (this=0x7ffff7229380 ::get_instance()::unique_instance>, __in_chrg=) at /home/arch/projects/MaaAssistantArknights/src/MaaCore/Config/Miscellaneous/OcrPack.h:63 #48 0x00007ffff7570b36 in __run_exit_handlers (status=1, listp=0x7ffff770a680 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108 #49 0x00007ffff7570c80 in __GI_exit (status=) at exit.c:138 #50 0x00007ffff7557cd7 in __libc_start_call_main (main=main@entry=0x5555556e2e30
, argc=argc@entry=3, argv=argv@entry=0x7fffffffe218) at ../sysdeps/nptl/libc_start_call_main.h:74 #51 0x00007ffff7557d8a in __libc_start_main_impl (main=0x5555556e2e30
, argc=3, argv=0x7fffffffe218, init=, fini=, rtld_fini=, stack_end=0x7fffffffe208) at ../csu/libc-start.c:360 #52 0x00005555555d0205 in _start () (gdb) ```

Note that 0x7ffd19200000 allocated by cudaMallocHost failed to be freed in cudaFreeHost when the program terminates and an exception were thrown. I have no idea how this could happen :thinking:

hunter-kk commented 7 months ago

Hello, I'm a member of MaaAssistantArknights, and it occurs on our program as the same.

Onnxruntime version: 1.15.1 with prebuild https://github.com/microsoft/onnxruntime/releases/download/v1.15.1/onnxruntime-linux-x64-gpu-1.15.1.tgz

Exception:

terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException'
  what():  /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 4: driver shutting down ; GPU=2000772548 ; hostname=Cryolitia-nixos ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_allocator.cc ; line=99 ; expr=cudaFreeHost(p); 

core dump:

                #0  0x00007f31a856fd7c __pthread_kill_implementation (libc.so.6 + 0x8cd7c)
                #1  0x00007f31a85209c6 raise (libc.so.6 + 0x3d9c6)
                #2  0x00007f31a85098fa abort (libc.so.6 + 0x268fa)
                #3  0x00007f31a56a9a89 _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold (libstdc++.so.6 + 0xa9a89)
                #4  0x00007f31a56b4f8a _ZN10__cxxabiv111__terminateEPFvvE (libstdc++.so.6 + 0xb4f8a)
                #5  0x00007f31a56b3ff9 __cxa_call_terminate (libstdc++.so.6 + 0xb3ff9)
                #6  0x00007f31a56b4716 __gxx_personality_v0 (libstdc++.so.6 + 0xb4716)
                #7  0x00007f31a87c2864 _Unwind_RaiseException_Phase2 (libgcc_s.so.1 + 0x17864)
                #8  0x00007f31a87c32bd _Unwind_Resume (libgcc_s.so.1 + 0x182bd)
                #9  0x00007f31134e1364 _ZN11onnxruntime8CudaCallI9cudaErrorLb1EEENSt11conditionalIXT0_EvNS_6common6StatusEE4typeET_PKcS9_S7_S9_S9_i (libonnxruntime_providers_cuda.so + 0xe1364)
                #10 0x00007f31134dd91b _ZN11onnxruntime19CUDAPinnedAllocator4FreeEPv (libonnxruntime_providers_cuda.so + 0xdd91b)
                #11 0x00007f31a7172d7d n/a (libonnxruntime.so.1.15.1 + 0x972d7d)
                #12 0x00007f31a7172f3d n/a (libonnxruntime.so.1.15.1 + 0x972f3d)
                #13 0x00007f31134eebe2 _ZN11onnxruntime21CUDAExecutionProviderD1Ev (libonnxruntime_providers_cuda.so + 0xeebe2)
                #14 0x00007f31134eed1d _ZN11onnxruntime21CUDAExecutionProviderD0Ev (libonnxruntime_providers_cuda.so + 0xeed1d)
                #15 0x00007f31a6a72b8a n/a (libonnxruntime.so.1.15.1 + 0x272b8a)
                #16 0x00007f31a6a72d7d n/a (libonnxruntime.so.1.15.1 + 0x272d7d)
                #17 0x00007f31a7b31ddd _ZN10fastdeploy10OrtBackendD1Ev (libMaaDerpLearning.so + 0x131ddd)
                #18 0x00007f31a7b31e69 _ZN10fastdeploy10OrtBackendD0Ev (libMaaDerpLearning.so + 0x131e69)
                #19 0x00007f31a7b27105 _ZN10fastdeploy7RuntimeD2Ev (libMaaDerpLearning.so + 0x127105)
                #20 0x00007f31a7b273d2 _ZNSt15_Sp_counted_ptrIPN10fastdeploy7RuntimeELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv (libMaaDerpLearning.so + 0x1273d2)
                #21 0x00007f31a8188859 _ZN10fastdeploy15FastDeployModelD1Ev (libMaaCore.so + 0x188859)

For more technical details:

  1. we use fastdeploy_ppocr in https://github.com/MaaAssistantArknights/MaaAssistantArknights/blob/0ae92d0de5f83a231d906f8e18ad99764ebab67e/src/MaaCore/Config/Miscellaneous/OcrPack.cpp#L124 , create two instances of fastdeploy::Runtime.
  2. Each fastdeploy::Runtime creates a Ort::Session in https://github.com/MaaAssistantArknights/FastDeploy/blob/master/fastdeploy/backends/ort/ort_backend.cc
  3. When the program exits 0 normally, occurs driver shutting down

Could it be caused by that, each Ort::Session instance owns a instance of cuda driver but the cuda driver was shut down globally when the first instance destructed, and the second instance tries to shut down a already-shut-down cuda driver.

I also encountered a similar problem, ORT should have a global variable inside, which was released early, resulting in the corresponding data can not be found when cudaFreeHost.

Kenneth-X commented 7 months ago

i meet the same error and solve it now it occurs when another gpu-task occupies the GPU and gpu memery is not enough (need 1100MB while only 800MB remains)