Closed WSUFan closed 3 months ago
Of course, with gcc's unwind, the line number was correct
Hi, thanks for the report. In general it's not entirely surprising that a line number could be off on -O2, however there does seem to be a small cpptrace bug here :)
In this case gcc detects the UB in bad
and optimizes it to https://godbolt.org/z/4Yo75eo1n
bad():
mov DWORD PTR ds:0, 0
ud2
If I try plugging this into my local test setup, modified to print object addresses instead of runtime addresses, I see the same behavior
#0 0x000000000004251f at /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000000000008573 in bad() at /mnt/c/Users/rifkin/home/projects/cpptrace/test/signal_demo.cpp:26:12
#2 0x0000000000007ba3 in main at /mnt/c/Users/rifkin/home/projects/cpptrace/test/signal_demo.cpp:99:8
#3 0x0000000000029d8f in __libc_start_call_main at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#4 0x0000000000029e3f in __libc_start_main_impl at ./csu/../csu/libc-start.c:392:3
#5 0x0000000000008344 at /mnt/c/Users/rifkin/home/projects/cpptrace/build/signal_demo
(where void bad() {
is line 26)
Looking at an objectdump
0000000000008570 <_Z3badv>:
void bad() { // ------> this is line 73
8570: f3 0f 1e fa endbr64
*p = 10; // ----------> this is line 78
8574: c7 04 25 00 00 00 00 mov DWORD PTR ds:0x0,0x0
857b: 00 00 00 00
857f: 0f 0b ud2
8581: 66 66 2e 0f 1f 84 00 data16 cs nop WORD PTR [rax+rax*1+0x0]
8588: 00 00 00 00
858c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
The address returned from libunwind would have corresponded to 0x0000000000008574
, the mov DWORD PTR ds:0x0,0x0
, but currently the library unconditionally subtracts 1 from the pointer
https://github.com/jeremy-rifkin/cpptrace/blob/a70cf7935adbc1522d2793dfc64a4ff0dfb19a9a/src/unwind/unwind_with_libunwind.cpp#L62
The reason why is that when you call
in x86 the instruction pointer written to the stack frame is for the instruction after the call and subtracting 1 gets the instruction pointer back into the call
, but I guess for a signal frame the address is just the pointer of the instruction where things went wrong, which makes sense to me.
As to why clang wouldn't print anything, that's weird but it could be a general dwarf symbol resolution issue I haven't encountered yet. Does it happen when a stack trace is generated outside a signal tracing setup?
I think the core of that issue should now be fixed on dev
.
When I tested again locally that frame was resolving to the vector construction which seems to be another edge case but this time in how the inlined subroutine is encoded in dwarf
0x00007bf7: DW_TAG_subprogram
DW_AT_external (true)
DW_AT_name ("bad")
DW_AT_decl_file ("/mnt/c/Users/rifkin/home/projects/cpptrace/test/signal_demo.cpp")
DW_AT_decl_line (26)
DW_AT_decl_column (0x06)
DW_AT_linkage_name ("_Z3badv")
DW_AT_low_pc (0x0000000000008570)
DW_AT_high_pc (0x0000000000008581)
DW_AT_frame_base (DW_OP_call_frame_cfa)
DW_AT_GNU_all_call_sites (true)
DW_AT_sibling (0x00007d53)
...
0x00007c31: DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x00007d53 "_ZNSt6vectorIiSaIiEEC4Ev")
DW_AT_entry_pc (0x0000000000008574)
DW_AT_GNU_entry_view (0x02)
DW_AT_low_pc (0x0000000000008574)
DW_AT_high_pc (0x0000000000008574)
DW_AT_call_file ("/mnt/c/Users/rifkin/home/projects/cpptrace/test/signal_demo.cpp")
DW_AT_call_line (29)
DW_AT_call_column (0x14)
I am able to reproduce clang not printing symbols as well, it may be due to
Terminate called after throwing an instance of cpptrace::detail::internal_error: Cpptrace internal error: Cpptrace assertion failed at /mnt/c/Users/rifkin/home/projects/cpptrace/src/symbols/symbols_with_libdwarf.cpp:610: void cpptrace::detail::libdwarf::dwarf_resolver::retrieve_symbol(const die_object &, Dwarf_Addr, Dwarf_Half, stacktrace_frame &, std::vector<stacktrace_frame> &): Vec should be empty?
ASSERT(vec.size() == 0);
Thank you for your response. I have checked out the dev
branch, and I was able to get the correct line numbers with gcc. However, with Clang, I tried -O0
and it works, -O2
does not yield any information at all.
Anyway, at least gcc works for now👍
Thanks for your patience. What version of clang are you using? I thought I had been able to reproduce however I am no longer able to, using clang 18.
I'm using clang 14 on Ubuntu 22. I also tried clang 17 on Ubuntu 22. Neither of them work.
Thanks for the additional information. Unfortunately I have been able to reproduce on either ubuntu 22 or 24 with clang 14, 17, or 18 on debug or release. There is a quirk of optimization surrounding bad()
's undefined behavior on clang that causes the failure path for sigaction
to always run printing sigaction: Resource temporarily unavailable
or similar, but that should only affect a test program like yours and only in release.
I'm going to go ahead and close this issue for now since the core issues have been solved, however I'm very interested in hunting down any clang issues. If there is any more info you can provide to help reproduce I would be grateful. My guess would be some discrepancy in how the build is being done affecting where debug symbols are being placed.
I've released v0.6.0 which includes the fix for this
Thanks!
I have tried the signal_safe demo and enabled libunwind. However, the line information in the stack trace appears to be incorrect. Here is my test main program:
The stacktrace generated with libunwind is
Howerver, the expected result is
Any idea why? Is the problem with libunwind? My compile command was
g++ -o test test.cc -g1 -O2 -lcpptrace -ldwarf -lz -lzstd -lunwind -llzma -w -std=c++17 -fpic
If I doO0
then the information was correct. Also, if I used clang, I got nothing