halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.91k stars 1.07k forks source link

Backtraces fail when a debugging inside pipeline. #8440

Open mcourteaux opened 1 month ago

mcourteaux commented 1 month ago

I'm using x86-64-linuxAOT-generated pipelines, statically linked in my binary. When I hit a halide_assert(), while running in the debugger, a backtrace never shows me where this function was actually called. It seems that calling conventions are not respected, and thus backtrace algorithms fail. I get something like this:

#0  0x00007ffff747a664 in __pthread_kill_implementation () at /usr/lib64/libc.so.6
#1  0x00007ffff7421c4e in raise () at /usr/lib64/libc.so.6
#2  0x00007ffff7409902 in abort () at /usr/lib64/libc.so.6
#3  0x0000000000980549 in halide_default_error ()
#4  0x0000000000989cbe in Halide::Runtime::Internal::(anonymous namespace)::HeapPrinter<(Halide::Runtime::Internal::PrinterType)1, 1024ul>::~HeapPrinter() [clone .120] ()
#5  0x000000000098a096 in halide_error_access_out_of_bounds ()
#6  0x0000000000873c87 in neonraw_bilateral_grid_loglum_constructor_8-x86-64-linux-avx2-fma-profile-cuda-no_bounds_query-no_runtime ()
#7  0x00000000008723a5 in neonraw_bilateral_grid_loglum_constructor_8 ()
#8  0x0000000000fcf040 in ??? ()
#9  0x3f0758b53ca0a0a1 in ??? ()
#10 0xbe651cc63f605ac1 in ??? ()
#11 0x00000000300040c8 in ??? ()
#12 0x0000000000000000 in ??? ()

So everything until frame #7 seems fine, but afterwards is total gibberish. Note that the function does correctly return and there is no bugs in control flow. It's just not debuggable if you can't go to the call-site.

alexreinking commented 1 month ago

What C++ compiler and version are you using?

abadams commented 1 month ago

It's a standard function call, but I think we're doing the equivalent of -fomit-frame-pointer, because that's the default behavior for O3. I can't figure out how to turn it off though in the LLVM API...

mcourteaux commented 1 month ago

It's a standard function call, but I think we're doing the equivalent of -fomit-frame-pointer, because that's the default behavior for O3.

Yeah, but then why does the stack trace work within the AOT-compiled pipeline and AOT-compiled runtime? It makes me think that only the entry-code is doing something weird regarding frame-pointers.

mcourteaux commented 1 month ago

What C++ compiler and version are you using?

My project (and I believe Halide too) is being compiled with this:

❯ clang-18 --version
clang version 18.1.8 (Fedora 18.1.8-1.fc40)
Target: x86_64-redhat-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Configuration file: /etc/clang/x86_64-redhat-linux-gnu-clang.cfg