jeremy-rifkin / libassert

The most over-engineered C++ assertion library
MIT License
547 stars 37 forks source link

SEGFAULT when using both libassert::stringify and cpptrace #103

Closed MatthiasJ1 closed 2 months ago

MatthiasJ1 commented 2 months ago

Compile command

g++ -g -std=c++20 main.cpp -lassert -lcpptrace -ldwarf

Header for all 3 snippets

#include <vector>
#include <string>
#include <iostream>
#include <libassert/assert.hpp>
using namespace std;

Using libassert::stringify

int main() {
    vector<string> a = {"hi"};
    cout << libassert::stringify(a) << endl;
}
> ./a.out                                                                                                                        
std::vector<std::string>: ["hi"]

Using cpptrace

int main() {
    vector<string> a = {"hi"};
    cout << cpptrace::generate_trace().frames[0].to_string() << endl;
}
> ./a.out                                                                                                                        
0x0000000100a01ce3 in main at main.cpp:10:13

Using both

int main() {
    vector<string> a = {"hi"};
    cout << libassert::stringify(a) << endl;
    cout << cpptrace::generate_trace().frames[0].to_string() << endl;
}
> ./a.out                                                                                                                        
std::vector<std::string>: ["hi"]
zsh: segmentation fault  ./a.out
jeremy-rifkin commented 2 months ago

Hi, thanks for opening this. Any segfaults are a high priority bug. I wasn't able to reproduce locally with the following setup:

// in a cpptrace clone
git checkout v0.7.0
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DCPPTRACE_BUILD_TESTING=On -DCMAKE_PREFIX_PATH=~/foobar -DCMAKE_INSTALL_PREFIX=~/foobar -GNinja && ninja install
// in a libassert clone
git checkout v2.1.0
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DCPPTRACE_BUILD_TESTING=On -DCMAKE_PREFIX_PATH=~/foobar -DCMAKE_INSTALL_PREFIX=~/foobar -DLIBASSERT_USE_EXTERNAL_CPPTRACE=On -GNinja

Then I compiled the test program using both with g++ test.cpp -isystem ~/foobar/include -L ~/foobar/lib -Wall -lassert -lcpptrace -ldwarf -lzstd -lz

› ./a.out
std::vector<std::string>: ["hi"]
0x0000555745304b63 at /path/to/repro/a.out

How did you compile and install libassert and cpptrace? What versions?

One thing that would additionally be helpful to check is that frames[0] actually exist before using it - if you could verify with .at(0) that would be very helpful.

MatthiasJ1 commented 2 months ago

I verified that frames[0] exists and the segfault is within to_string(). I compiled everything with both clang18 and clang 15 from Conda-Forge on macOS M1 and got the same results.


cpptrace url: https://github.com/jeremy-rifkin/cpptrace/archive/refs/tags/v0.7.0.tar.gz sha256: b5c1fbd162f32b8995d9b1fefb1b57fac8b1a0e790f897b81cdafe3625d12001

mkdir build && cd build
cmake .. -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$PREFIX -DCPPTRACE_USE_EXTERNAL_LIBDWARF=ON
ninja install

libassert url: https://github.com/jeremy-rifkin/libassert/archive/refs/tags/v2.1.0.tar.gz sha256: e42405b49cde017c44c78aacac35c6e03564532838709031e73d10ab71f5363d

mkdir build && cd build
cmake .. -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$PREFIX -DLIBASSERT_USE_EXTERNAL_CPPTRACE=ON -DLIBASSERT_USE_EXTERNAL_MAGIC_ENUM=ON
ninja install

I can give you the full Conda build config files if you'd like and I can also possibly debug this at some point on my machine but I don't have time at the moment.

Here is the build environment for libassert with clang18:

- bzip2 1.0.8
- c-ares 1.33.1
- ca-certificates 2024.8.30
- cctools_osx-arm64 1009.2
- clang 18.1.8
- clang-18 18.1.8
- clang_impl_osx-arm64 18.1.8
- clang_osx-arm64 18.1.8
- clangxx 18.1.8
- clangxx_impl_osx-arm64 18.1.8
- clangxx_osx-arm64 18.1.8
- cmake 3.30.3
- compiler-rt 18.1.8
- compiler-rt_osx-arm64 18.1.8
- icu 75.1
- krb5 1.21.3
- ld64_osx-arm64 907
- libclang-cpp18.1 18.1.8
- libcurl 8.9.1
- libcxx 18.1.8
- libcxx-devel 18.1.8
- libedit 3.1.20191231
- libev 4.33
- libexpat 2.6.3
- libiconv 1.17
- libllvm18 18.1.8
- libnghttp2 1.58.0
- libssh2 1.11.0
- libuv 1.48.0
- libxml2 2.12.7
- libzlib 1.3.1
- llvm-tools 18.1.8
- llvm-tools-18 18.1.8
- ncurses 6.5
- ninja 1.12.1
- openssl 3.3.2
- rhash 1.4.4
- sigtool 0.1.3
- tapi 1300.6.5
- xz 5.2.6
- zstd 1.5.6
- cpptrace 0.7.0
- libdwarf 0.11.0
- magic_enum 0.9.6
jeremy-rifkin commented 2 months ago

Thanks so much, I can reproduce on an M1 Mac. I'm pasting my script below, mainly for my own remembrance. I have clang 15 and used external libdwarf, but looks like that didn't make a difference.

mkdir test && cd test

git clone https://github.com/jeremy-rifkin/cpptrace.git
cd cpptrace
git checkout v0.7.0
mkdir build && cd build
cmake .. -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=~/foobar -DCMAKE_INSTALL_PREFIX=~/foobar -DCPPTRACE_USE_EXTERNAL_LIBDWARF=Off && ninja install
cd ../..

// in a libassert clone
git clone https://github.com/jeremy-rifkin/libassert.git
cd libassert
git checkout v2.1.0
mkdir build && cd build
cmake .. -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=~/foobar -DCMAKE_INSTALL_PREFIX=~/foobar -DLIBASSERT_USE_EXTERNAL_CPPTRACE=On -DLIBASSERT_USE_EXTERNAL_MAGIC_ENUM=On && ninja install
cd ../..

mkdir test && cd test
cat <<eos > test.cpp
#include <vector>
#include <string>
#include <iostream>
#include <libassert/assert.hpp>
using namespace std;

int main() {
    vector<string> a = {"hi"};
    cout << libassert::stringify(a) << endl;
    cout << cpptrace::generate_trace().frames.at(0).to_string() << endl;
}
eos

clang++ test.cpp -isystem ~/foobar/include -L ~/foobar/lib -Wall -lassert -lcpptrace -ldwarf -lzstd -lz -std=c++17 -g

Edit: I was able to reproduce with -DCMAKE_BUILD_TYPE=Debug too, that'll make a big difference for triaging this.

Edit: And sanitizers too, fantastic

jeremy-rifkin commented 2 months ago

There's something really spooky going on here. Some initial triage:

There are a series of string format calls, and one fails but the others are fine. The failing one is https://github.com/jeremy-rifkin/cpptrace/blob/0742b42dadaac62436cb226a7d084738a8f82d1a/src/cpptrace.cpp#L147

                str += microfmt::format(":{}", line.value());

That calls https://github.com/jeremy-rifkin/cpptrace/blob/0742b42dadaac62436cb226a7d084738a8f82d1a/src/utils/microfmt.hpp#L281

    template<typename... Args>
    std::string format(const char* fmt, Args&&... args) {
        return detail::format<sizeof...(args)>(fmt, fmt + std::strlen(fmt), {detail::format_value(args)...});
    }

Which calls https://github.com/jeremy-rifkin/cpptrace/blob/0742b42dadaac62436cb226a7d084738a8f82d1a/src/utils/microfmt.hpp#L189

        template<std::size_t N, typename It>
        std::string format(It fmt_begin, It fmt_end, std::array<format_value, N> args) {

It's taking an array by value, there should be no possible way to see array.data() be a pointer in the null page here... But lldb shows that it is. And some printbugging of array.data():

-----------------0x16bd7a560
-----------------0x107f01320
-----------------0x107f01308
-----------------0xa

With the 0xa array.data() originating from the str += microfmt::format(":{}", line.value()); call.

One thing here is that my two libraries, libassert and cpptrace, use this microfmt code, which I'm realizing is an ODR problem. Between cpptrace 0.7.0 and libassert 2.1.0 the files match verbatim. This I don't think is ODR, but I'm going to at least namespace them differently to prevent future problems.

I'm stumped as to the cause here, I'm going to have to pick things apart more in-depth.

jeremy-rifkin commented 2 months ago

Ah, I understand. The format_value class differs pre/post-c++17. I’ll fix this tomorrow and create patch releases for both libraries. This is an embarrassing mistake; thank you so much for the thorough bug report!

Edit to elaborate more, in case my future self or anyone else stumbles across this again: The smoking gun is https://godbolt.org/z/shbfxhbha. The std::string_view changes how the array of format_values is passed and that's a very troubling ABI break. I don't know how it ever worked on x86.

MatthiasJ1 commented 2 months ago

Awesome, thanks for writing/maintaining these libraries!

jeremy-rifkin commented 2 months ago

I've released v2.1.1 with a fix for this