Closed godlygeek closed 2 months ago
Does it matter that this is happening in a shared library?
No, apparently not. I've updated my reproducer to drop the shared library and move everything directly into the main executable.
Thanks.
Is there a reason for this line?
objcopy --add-section .gnu_debugdata=mini_debuginfo.xz main.debug
Here main.debug
already contains debug info, so why bother to add a .gnu_debugdata
section that points to another copy of the debug info? Am I missing something there? Thanks.
You're not missing something, it is indeed a weird thing to do, but strange debug info shouldn't crash libbacktrace 😄
The real world case where we encountered this isn't quite as strange. The Memray memory profiler vendors a copy of libbacktrace, which we patch to support lookups against debuginfod. We hit this stack overflow because a file had been uploaded to our debuginfod server that didn't actually contain full debug info, only the .gnu_debugdata
section containing the MiniDebugInfo.
These steps are considerably closer to the "real" case where we hit this issue:
...
objcopy --only-keep-debug main main.debug
objcopy -S --remove-section .gdb_index --remove-section .comment --keep-symbols=symbols_to_keep.txt main.debug mini_debuginfo
strip --strip-all -R .comment main
xz -f mini_debuginfo
objcopy --add-section .gnu_debugdata=mini_debuginfo.xz main
# Install the main binary itself into /usr/lib/debug/.build-id
build_id=$(readelf -n main | sed -n '/^.*Build ID:\s*/s///p')
first_two=${build_id:0:2}
rest=${build_id:2}
echo "Installing /usr/lib/debug/.build-id/$first_two/$rest.debug"
sudo sh -c "mkdir -p /usr/lib/debug/.build-id/$first_two && cp main /usr/lib/debug/.build-id/$first_two/$rest.debug"
...
Same idea as above, except the actual split debug info is discarded, and the /usr/lib/debug/.build-id
directory is populated with a copy of the executable itself, with only the MiniDebugInfo and not the full debug info.
I'm updating Memray's vendored libbacktrace to do:
diff --git a/elf.c b/elf.c
index 107e26c..e62668b 100644
--- a/elf.c
+++ b/elf.c
@@ -6841,7 +6876,8 @@ elf_add (struct backtrace_state *state, const char *filename, int descriptor,
}
}
- if (!gnu_debugdata_view_valid
+ if (!debuginfo
+ && !gnu_debugdata_view_valid
&& strcmp (name, ".gnu_debugdata") == 0)
{
if (!elf_get_view (state, descriptor, memory, memory_size,
I'm pretty sure that's correct: if we've already found real debug info, we can just ignore any MiniDebugInfo. Let me know if I have that wrong 😅
Thanks, this should be fixed in the repo now.
If a shared library contains an NT_GNU_BUILD_ID note,
elf_add
will attempt to load its debug info from/usr/lib/debug/.build-id
, recursing to process the located debug info if it's found. If that debug info contains a.gnu_debugdata
section,elf_add
decompresses that MiniDebugInfo and recurses again to process it. If the MiniDebugInfo contains the same NT_GNU_BUILD_ID note as the original shared library, this pairwise recursion will repeat forever, until either theulimit -n
limit on the number of open file descriptors is reached or the stack overflows.For a minimal-ish reproducer:
Tracked down while investigating https://github.com/bloomberg/memray/issues/636