jeremy-rifkin / cpptrace

Simple, portable, and self-contained stacktrace library for C++11 and newer
MIT License
621 stars 64 forks source link

Use after free when using split dwarf #141

Closed tsvstar closed 1 month ago

tsvstar commented 2 months ago

Two issues occurred when I tried to use the library for a big project on Linux. Maybe they depend on each other, but maybe don't. The project is built on CentOS Linux 7, using gcc-11.2 and is statically linked.

1. Coredump

Do nothing special. Just try to get stacktrace.

Thread 1 "cserver" received signal SIGSEGV, Segmentation fault.
0x#0  0x00000000070e7da6 in __gnu_cxx::__exchange_and_add (__val=-1, __mem=0xfffffffffffffff8) at /linopt/gcc/gcc-11.2.0-rh6/include/c++/11.2.0/ext/atomicity.h:66
#1  __gnu_cxx::__exchange_and_add_dispatch (__val=-1, __mem=0xfffffffffffffff8) at /linopt/gcc/gcc-11.2.0-rh6/include/c++/11.2.0/ext/atomicity.h:101
#2  std::string::_Rep::_M_dispose (this=0xffffffffffffffe8, __a=...) at /linopt/gcc/gcc-11.2.0-rh6/include/c++/11.2.0/bits/basic_string.h:3347
#3  0x00000000070e7994 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string (this=0x7ffcbf6a9490, __in_chrg=<optimized out>) at /linopt/gcc/gcc-11.2.0-rh6/include/c++/11.2.0/bits/basic_string.h:37
66
#4  0x000000001230730a in cpptrace::stacktrace_frame::~stacktrace_frame (this=0x7ffcbf6a9470, __in_chrg=<optimized out>) at /devel/cpptrace/include/cpptrace/cpptrace.hpp:150
#5  0x00000000127b5990 in cpptrace::detail::frame_with_inlines::~frame_with_inlines() ()
#6  0x00000000127b5286 in cpptrace::detail::libdwarf::resolve_frames(std::vector<cpptrace::object_frame, std::allocator<cpptrace::object_frame> > const&) ()
#7  0x00000000127aec17 in cpptrace::detail::resolve_frames(std::vector<unsigned long, std::allocator<unsigned long> > const&) ()
#8  0x00000000127a3aa5 in cpptrace::generate_trace(unsigned long, unsigned long) ()
#9  0x00000000127a3a4a in cpptrace::generate_trace(unsigned long) ()

2. Fail to read object file.

The same generate_trace call reports multiple errors Cpptrace internal error: Unable to read object file main before core. Although I'm not able to enter by debugger inside of cpptrace (no matter if it was built in ReleaseWithDebug or Debug or Release mode), its modification reveals that this happened inside of the #else branch of object.cpp ( !defined(CPPTRACE_HAS_DL_FIND_OBJECT) && !defined(HAS_DLADDR1)

"main" is what is mentioned in the CmakeList.txt in target_link_libraries(main ....) and the name of the directory. It is also mentioned by the boost::stacktrace output ( ... in main) for that stacktrace. But no "main" file exists at all, and moreover, in the current directory, the executable has a different name (cserver).

add_executable(cserver cserver_init.cpp)
target_link_libraries(cserver main)

I'm not interested in object file information, but I'm interested in the line information. I'm not sure why this info is needed and if the library will still be workable without loading object files.

jeremy-rifkin commented 2 months ago

Can you try with version 0.6.2? If you’re on centos 7 dladdr1 should exist but there was an issue with it not being selected properly in 0.6.1.

tsvstar commented 2 months ago

On 0.6.2 the second problem (Unable to read object file main) was gone. Now the correct path to binary watched. But the first problem (core) still exists on the same place.

jeremy-rifkin commented 2 months ago

Thanks, glad the other issue was resolved. A segfault is concerning, unfortunately I probably need to be able to repro to debug. I can spin up a centos 7 box later to try. Can you send the exact code you’re using just so I can be sure I am using the same setup? Another helpful data point, if it’s not too much work, would be to turn on address sanitizer (both for the library and binary) and see if it catches anything.

To answer the question at the end of the issue:

I'm not interested in object file information, but I'm interested in the line information. I'm not sure why this info is needed and if the library will still be workable without loading object files.

Object information has to be resolved in order to get line info. For a given instruction pointer in runtime address space, cpptrace must figure out what object it came from (either the main executable or a library .so), then any runtime address randomization must be reversed, then any offsets from the ELF must be reversed, and only then can the appropriate debug information be looked up in the executable/library object.

jeremy-rifkin commented 2 months ago

I setup a centos 7 VM but was unable to reproduce. I used devtoolset-10 while testing.

tsvstar commented 2 months ago

Here are my results.

1) I create a small, simple project that adds the library and runs basic calls. It works ok for "Debug" build. For RelWithDebugInfo build doesn't resolve symbols (symbol in frames made by cpptrace::generate_trace() are empty). .print() output has appearance "#0 0x0000000000406e24 at /devel/cpptrace/my/MyProject"

2) I make a RelWithDebugInfo+sanitizer library build and add it to my project as an external lib. It cores with the same trace as in the first message. Sanitizer wasn't triggered.

3) I add a lot of debugging output (particularly adding all ctor/dtor/assign to stacktrace_frame to track them). This reveals a strange thing - first multiple null_frame objects were created, then symbols_with_libdwarf.cpp:92 std::vector trace create numerous copies of them - and surprisingly at this point both string values has c_str() which points to inaccessible memory.

4) I implode the library into the project (copy src, include and CMakeLists.txt, and add as a subdirectory). After a few fixes (set paths, configuration), I was able to run it in pure Debug+sanitizer. First, no CPPTRACE_GET_SYMBOLS_WITH_ and CPPTRACE_CPPTRACE_DEMANGLE_WITH_ symbols were defined so details::resolve_frames did nothing. But immediately after that, it was cored on cpptrace::detail::demangle because frame.symbol refers to inaccessible memory. Still no sanitizer report.

5) Then I defined explicitly CPPTRACE_GET_SYMBOLS_WITH_LIBDWARF and CPPTRACE_CPPTRACE_DEMANGLE_WITH_CXXABI—and now the library almost works. It doesn't core; it shows the correct address but doesn't show the symbol.

#0  cpptrace::detail::libdwarf::dwarf_resolver::retrieve_symbol (this=0x613000032380, cu_die=..., pc=187963276, dwversion=4, frame=..., inlines=std::vector of length 0) at src/cpptrace/src/symbols/dwarf/dwarf_resolver.cpp:595
#1   in cpptrace::detail::libdwarf::dwarf_resolver::resolve_frame_core (this=0x613000032380, object_frame_info=..., frame=..., inlines=std::vector of length 0) at src/cpptrace/src/symbols/dwarf/dwarf_resolver.cpp:1017
#2   in cpptrace::detail::libdwarf::dwarf_resolver::resolve_frame (this=0x613000032380, frame_info=...) at /src/cpptrace/src/symbols/dwarf/dwarf_resolver.cpp:1053
#3   in cpptrace::detail::libdwarf::dwarf_resolver::perform_dwarf_fission_resolution (this=0x613000034840, cu_die=..., dwo_name=..., object_frame_info=..., frame=..., inlines=std::vector of length 0) at src/cpptrace/src/symbols/dwarf/dwarf_resolver.cpp:990
#4   in cpptrace::detail::libdwarf::dwarf_resolver::resolve_frame_core (this=0x613000034840, object_frame_info=..., frame=..., inlines=std::vector of length 0) at src/cpptrace/src/symbols/dwarf/dwarf_resolver.cpp:1014
#5   in cpptrace::detail::libdwarf::dwarf_resolver::resolve_frame (this=0x613000034840, frame_info=...) at src/cpptrace/src/symbols/dwarf/dwarf_resolver.cpp:1053
#6   in cpptrace::detail::libdwarf::resolve_frames (frames=std::vector of length 17 = {...}) at src/cpptrace/src/symbols/symbols_with_libdwarf.cpp:103
#7   in cpptrace::detail::resolve_frames (frames=std::vector of length 17 = {...}) at src/cpptrace/src/symbols/symbols_core.cpp:134
#8   in cpptrace::raw_trace::resolve (this=0x7fbdf228e280) at src/cpptrace/src/cpptrace.cpp:48

preprocess_subprograms always return empty vector.

tsvstar commented 2 months ago

UPD: I added the same options as in prod to the simple project and now sanitizer triggering on heap-use-after-free. MyProject.zip sanitizer_report.txt

jeremy-rifkin commented 2 months ago

This information is immensely helpful, thanks so much for taking the time to look into this. It's especially helpful to know it might be an issue in the split dwarf code path. Something does definitely seem wrong so I'll try to understand what's going on here.

jeremy-rifkin commented 2 months ago

I'm was able to reproduce a sanitizer error in your setup doing a RelWithDebInfo build. I built cpptrace with sanitizers and then used a trimmed cmake file:

cmake_minimum_required(VERSION 3.10)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-omit-frame-pointer -fno-optimize-sibling-calls -gdwarf-4 -gsplit-dwarf")

project(repro CXX)

list(APPEND CMAKE_PREFIX_PATH "path/to/projects/cpptrace/build/foo")

find_package(cpptrace REQUIRED)

add_executable(repro main.cpp)
target_link_libraries(repro PRIVATE cpptrace::cpptrace)
target_link_options(repro PRIVATE -gsplit-dwarf -fsanitize=address)
==3794==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000e3da0 at pc 0x55d33537e687 bp 0x7ffdfdd0d870 sp 0x7ffdfdd0d860
READ of size 8 at 0x6060000e3da0 thread T0
    #0 0x55d33537e686 in dwarf_dealloc_die /mnt/c/Users/rifkin/home/projects/cpptrace/build/_deps/libdwarf-src/src/lib/libdwarf/dwarf_alloc.c:786
    #1 0x55d33531cdaf in cpptrace::detail::libdwarf::die_object::~die_object() /mnt/c/Users/rifkin/home/projects/cpptrace/src/symbols/dwarf/../../utils/dwarf.hpp:71
    #2 0x55d33532fda3 in cpptrace::detail::libdwarf::skeleton_info::~skeleton_info() /mnt/c/Users/rifkin/home/projects/cpptrace/src/symbols/dwarf/dwarf_resolver.cpp:74
    #3 0x55d335346a4c in cpptrace::detail::optional<cpptrace::detail::libdwarf::skeleton_info, 0>::reset() /mnt/c/Users/rifkin/home/projects/cpptrace/src/symbols/dwarf/../../binary/../utils/utils.hpp:293
    #4 0x55d335337085 in cpptrace::detail::optional<cpptrace::detail::libdwarf::skeleton_info, 0>::~optional() /mnt/c/Users/rifkin/home/projects/cpptrace/src/symbols/dwarf/../../binary/../utils/utils.hpp:212
    #5 0x55d33532387d in cpptrace::detail::libdwarf::dwarf_resolver::~dwarf_resolver() /mnt/c/Users/rifkin/home/projects/cpptrace/src/symbols/dwarf/dwarf_resolver.cpp:216
    #6 0x55d3353239ad in cpptrace::detail::libdwarf::dwarf_resolver::~dwarf_resolver() /mnt/c/Users/rifkin/home/projects/cpptrace/src/symbols/dwarf/dwarf_resolver.cpp:216
    #7 0x55d33534bee6 in std::default_delete<cpptrace::detail::libdwarf::dwarf_resolver>::operator()(cpptrace::detail::libdwarf::dwarf_resolver*) const /usr/include/c++/11/bits/unique_ptr.h:85
    #8 0x55d33533ef60 in std::unique_ptr<cpptrace::detail::libdwarf::dwarf_resolver, std::default_delete<cpptrace::detail::libdwarf::dwarf_resolver> >::~unique_ptr() /usr/include/c++/11/bits/unique_ptr.h:361
    #9 0x55d335367323 in std::pair<unsigned long long const, std::unique_ptr<cpptrace::detail::libdwarf::dwarf_resolver, std::default_delete<cpptrace::detail::libdwarf::dwarf_resolver> > >::~pair() /usr/include/c++/11/bits/stl_pair.h:211

I realized the issue, it's a very subtle issue with destructors I'd ran into previously but didn't handle with split dwarf. Fix should be the following, will commit later

diff --git a/src/symbols/dwarf/dwarf_resolver.cpp b/src/symbols/dwarf/dwarf_resolver.cpp
index 7627cd6..1c47a17 100644
--- a/src/symbols/dwarf/dwarf_resolver.cpp
+++ b/src/symbols/dwarf/dwarf_resolver.cpp
@@ -208,6 +208,8 @@ namespace libdwarf {
             }
             // subprograms_cache needs to be destroyed before dbg otherwise there will be another use after free
             subprograms_cache.clear();
+            split_full_cu_resolvers.clear();
+            skeleton.reset();
             if(aranges) {
                 dwarf_dealloc(dbg, aranges, DW_DLA_LIST);
             }
tsvstar commented 2 months ago

It seems that fixes the heap-use-after-free issue. But on the production I still not be able to get symbols. I mean I correctly see file - line - column, but symbol is empty. (please take a look item 5 in this my message). Turning on flags dump_dwarf and trace_dwarf give nothing valuable.

Starting resolution for ..../src/file.c.dwo 0b34178c
..../src/file.c.dwo
b34178c
End walk_dbg

and then hundreds of End walk_die_list lines.

The given path and file file.c.dwo exist.

jeremy-rifkin commented 2 months ago

Hi, there’s a chance this could be related to an upstream libdwarf issue regarding how rangelists are handled. Does item 5 happen when you haven’t pulled cpptrace into your project directly and instead link to a copy built elsewhere?