[FR] log stack trace in the fatal signal handler

mirageAlchemy commented 4 days ago

Is your feature request related to a problem? Please describe. Often in a program that is running online, an unexpected segment fault might happen. In this scenario a full stack trace(and possibly register context) on the thread that triggers the signal could be very useful in assisting trace down the bug.

Describe the solution you'd like A new enable_stacktrace_printing in SignalHandlerOptions would be fine. If the enable_stacktrace_printing is true, print the stacktrace by cpptrace, <stacktrace>, or libunwind.

Additional context I can indeed implement the signal handler myself, but in that case I would not be able to use features in SignalHandlerOptions.

odygrd commented 4 days ago

Hey! I've actually considered and tried implementing something like this a few times.

The main challenges I encountered were around cross-platform support. It’s tricky to handle this in a way that works reliably across different operating systems. Another issue is that printing stack traces from within a signal handler is complicated, as it requires demangling symbols and attempting to display variable information—all while in a restricted context. Signal handlers have limitations, especially because running non-reentrant functions within them can lead to undefined behavior or additional crashes.

That said, I'm planning to explore C++23’s std::stacktrace at some point to see if it offers a more robust approach.

In the meantime, I’ve found a more reliable workaround using a Python script:

Use a Python script to start your C++ process and monitor it as a subprocess, ensuring that core dumps are generated.
When your C++ program crashes, the Python script can detect it.
The script then locates the latest core dump for the crashed process and launches gdb on it, passing in the binary and the core dump.
Finally, it appends the output of the gdb backtrace to the log file of the C++ process.

This approach yields much cleaner and more complete stack traces without needing to handle everything inside a signal handler. It also lets you access additional debugging information while avoiding the pitfalls associated with signal handlers.

mirageAlchemy commented 4 days ago

Hi, thanks for the reply and this amazing piece of work! I understand the concerns:

it has to be portable
it should be a header only library
it better support standard as old as c++20

while cpptrace is a portable library, it fails at 2
std::stacktrace is a c++23 feature and it might be too hasty to require all users to bump to C++23. Not to mention clang does not yet support it.
libunwind is not portable

Using core dump is a great one, but sometimes for me the core is too large to even dump to the disk. I might turn to defining the signal handler myself in a less portable way, because I mainly work and run program on linux.

Thanks again and I think I will leave the issue open in case in the future any of the above concern is addressed.

odygrd commented 4 days ago

Yes, the build-in signal handler is optional, so you can easily take the existing code and adapt it as a new signal handler to print a stack trace.

I’m fine with keeping this issue open—I might revisit it at some point.

If you have some disk space available, using gdb on core dumps generally provides much cleaner and more detailed stack traces.

Another option is to try generating a minimalistic core dump like minicoredumper, though I haven’t personally used it.

If you’re interested in giving gdb a try, here’s the Python snippet I use to automatically generate a stack trace from a core dump:

gdb_output = subprocess.run(
    ["gdb", "--batch", "-ex", "bt full", "-ex", "quit", bin_path, latest_core_dump],
    capture_output=True, text=True
)

logfile.write("\nStack trace from GDB:\n")
logfile.write(gdb_output.stdout)
logfile.write("\n")

odygrd / quill

[FR] log stack trace in the fatal signal handler #627