Open losfair opened 4 years ago
Thanks for the report! Can you perhaps get a stack trace in a debugger for this?
One common issue i've seen is that the sigaltstack is too small, so it may be a "double" stack overflow where the trap_handler
is overflowing the sigaltstack, causing a second segfault.
I tried to allocate a 1MB sigaltstack, but the error persists:
let mut stack_space = vec![0u8; 1048576];
let new_stack = libc::stack_t {
ss_sp: stack_space.as_mut_ptr() as *mut _,
ss_flags: 0,
ss_size: 1048576,
};
assert_eq!(libc::sigaltstack(&new_stack, ptr::null_mut()), 0);
I wasn't able to get a stack trace because the debugger can't resume execution from the signal handler after a EXC_BAD_ACCESS
exception, due to a Darwin kernel bug.
Ah sorry but without the ability to reproduce or debug I'm not really sure what's going on here, I can't really help a whole lot :(
Excerpting relevant comments from the PR that adds a test to demonstrate this:
This library is not async signal safe, but it is safe for synchronous signals. In this case generating a backtrace from a segfault handler is intended to work.
—alexcrichton
Whether signal is generated in synchronous or asynchronous manner doesn't change the fact that the signal handler can only use async-signal-safe functions.
Take for example one reason why this crate isn't safe to use from a signal handler: the use of memory allocation routines. If signal is generated during an execution of a malloc, which holds an internal lock, and then the signal handler allocates memory and needs to acquire the same lock, a deadlock will occur.
—tmiasko
The segfault here is in the libunwind unwinder itself, and after researching a bit as to what's going on, it looks like the segfault is happening 16 bytes below the end of the stack. I believe the sequence of events can be reconstructed as:
- Using libunwind we can get a handful of frames.
- The frame that segfaults happens when we unwind the first frame of
f
- The frame
f
faulted in the middle of the function prologue- The unwind information for
f
is stored in a "compact format"- The compact format does not have a way to describe how to unwind in the middle of the prologue, instead it only defines how to unwind "during" the function
- In interpreting the compact unwind information libunwind will hit a segfault again, trying to access memory the function itself faulted trying to push.
The issue here is that a stack overflow exception can happen anywhere in the prologue of a function, but generally unwind tables are not intended for arbitrarily happening in the prologue (there's the notion of "async unwind tables" on some systems for this). This means that the unwinder can't reliably unwind frames that are interrupted in the prologue.
Oh what I mean is that to generate a backtrace from a function that segfaulted in its prologue libunwind needs to know how to unwind from every single instruction in the function, not just the "body" after the prologue. AFAIK that's only supported with async unwind tables (and maybe full-dwarf unwind tables?), and I'm not sure how to get LLVM to generate non-compact or async unwind tables.
—alexcrichton
I do not see a reason to close this issue but to be frank, it is the sort of enhancement request that is likely to be open for a long, long time.
Ignoring apple's compact unwind info I did expect backtraces to work in the prologue even without asynchronous unwinding support. Asynchronous unwinding is only necessary when popping stack frames and running cleanup code for faults at arbitrary instructions. I very much expect backtrace generation to unconditionally work at arbitrary locations. Sampling profiles depend on this.
Also I believe LLVM is going to stop emitting compact unwind info for rust code or any other code not using the C, C++ or Obj-C personality functions as there is a limit of 3 personality functions in the compact unwind info format and these personality functions take up all room when used in the same executable/dylib.
I do think that we should try to improve the situation, FWIW, and I am aware incremental improvements may be sufficient for many use-cases. It just seems like fixing all this is a nontrivial haul.
Also I believe LLVM is going to stop emitting compact unwind info for rust code or any other code not using the C, C++ or Obj-C personality functions as there is a limit of 3 personality functions in the compact unwind info format and these personality functions take up all room when used in the same executable/dylib.
Can you confirm this and if so, open a new issue for that?
If I understand correctly it got merged, then reverted because of a build error and a revert of the revert has been posted but not yet merged: https://github.com/rust-lang/rust/issues/102754#issuecomment-1580914857
I'm trying to get a backtrace from a SIGSEGV caused by stack overflow (hitting guard page). It seems that this is not working on macOS.
My reproduction case:
Output:
Rust version: