rust-lang / backtrace-rs

Backtraces in Rust
https://docs.rs/backtrace
Other
526 stars 245 forks source link

Need to be able to reliably get symbol addrs #520

Open jswrenn opened 1 year ago

jswrenn commented 1 year ago

The documentation for Frame::symbol_address warns:

This will attempt to rewind the instruction pointer returned by ip to the start of the function, returning that value. In some cases, however, backends will just return ip from this function.

Consequently, the following code 'works' on x86_64-unknown-linux-gnu, but not on aarch64-apple-darwin:

use backtrace;
use std::{hint::black_box, ptr, ffi::c_void};

fn main() {
    black_box(function());
}

#[inline(never)]
fn function() {
    let function = function as *const c_void;
    println!("searching for symbol_address={:?}", function);

    backtrace::trace(|frame| {
        println!("unwound to {:?}", frame);
        if ptr::eq(frame.symbol_address(), function) {
            println!("found it!"); // not reached on aarch64-apple-darwin :(
            return false;
        }
        true
    });
}

Is this expected behavior on this platform? If so, is there any way to work around this discrepancy?

In the scoped-trace crate, I use symbol address equality to capture backtraces with limited upper and lower unwinding bounds. I'm hoping to get this crate working on aarch64-apple-darwin.

bjorn3 commented 1 year ago

On macOS the function to get the address of the enclosing function of an ip address (_Unwind_FindEnclosingFunction) is unreliable due to compact unwind info collapsing multiole functions with identical unwind info together: https://github.com/rust-lang/backtrace-rs/blob/b3e5bb857773fcc6fc8247374264bd4c8acd5387/src/backtrace/libunwind.rs#L58 If the executable is not stripped you can try parsing the executable itself using eg the object crate and finding the last symbol before the ip address.

jswrenn commented 1 year ago

If the executable is not stripped you can try parsing the executable itself using eg the object crate and finding the last symbol before the ip address.

That's not too bad! Would backtrace-rs accept a PR implementing this?

philipc commented 1 year ago

The symbolization already does that. I wonder why Symbol::addr doesn't return the address for symtab entries.

jswrenn commented 1 year ago

If I had to guess, it's because it looks like backtrace-rs currently uses information from DWARF xor symtab entries — not both. In a situation where DWARF debuginfo was completely unavailable, Frame::symbol_address might behave as expected.

philipc commented 1 year ago

backtrace-rs falls back to symtab entries if it can't find a DWARF entry. But both of those are only used in the symbolizer. Frame::symbol_address only uses the unwinder. It doesn't and shouldn't use DWARF or symbol table entries. You need to resolve the frame if you want to use those.

workingjubilee commented 1 year ago

It seems like everything is working as intended, then? Shall we close this?

jswrenn commented 1 year ago

@workingjubilee Maaaybe? The comment here: https://github.com/rust-lang/backtrace-rs/blob/b3e5bb857773fcc6fc8247374264bd4c8acd5387/src/backtrace/libunwind.rs#L58-L68 ...uses the phrase "if this is fixed" — which suggests that something unwelcome (albeit not unknown) is happening here.

Could we document this shortcoming? Or even make it explicit in the API by making ip an Option? Could we even eliminate this shortcoming? E.g.:

I almost would rather if backtrace-rs used the unreliable output _Unwind_FindEnclosingFunction — then at least symbol_address would produce sometimes useful results on macOS, rather than always-useless (i.e., not more useful than ip) results.

bjorn3 commented 1 year ago

Compact unwinding can be disabled when linking a binary or library, but when compact unwinding was enabled when linking (as is done for all system libraries and by default for user code), there are no DWARF unwinding tables remaining.

I almost would rather if backtrace-rs used the unreliable output _Unwind_FindEnclosingFunction — then at least ip would produce sometimes useful results on macOS, rather than always-useless (i.e., not more useful than sp) results.

I did expect the current output to be useful for looking up in the symbol table which should always give the correct result if the symbol table exists at all. The result of _Unwind_FindEnclosingFunction may result in the wrong function without any option to get the correct result using the symbol table.

jswrenn commented 1 year ago

(Whoops, edited my last comment because I got my function names mixed up.)

I did expect the current output to be useful for looking up in the symbol table which should always give the correct result if the symbol table exists at all.

Am I right to think that you could instead use ip in this case?

Alternatively, could backtrace-rs do that look-up into the symbol table?

bjorn3 commented 1 year ago

Am I right to think that you could instead use ip in this case?

Right, you could.

Alternatively, could backtrace-rs do that look-up into the symbol table?

I think that would make sense.

workingjubilee commented 1 year ago

The result of _Unwind_FindEnclosingFunction may result in the wrong function without any option to get the correct result using the symbol table.

Yeah, I think that kills the idea of using that on macOS dead. Any guess that might be wrong seems like it kinda breaks with what symbol_address says it does: it says it rewinds to the start of the function (implicit: correctly) or stays equal to ip, allowing you to detect which happens. It's better to simply return a value equal to ip if we're not going to produce a guaranteed-useful answer.

Regarding doing the table lookup implicitly, I don't think it should be completely off the (heh) table, but I'm slightly concerned about, and would like to hear an elaboration on, @philipc's perspective, namely:

It doesn't and shouldn't use DWARF or symbol table entries. You need to resolve the frame if you want to use those.

I can guess why this was said, but it's likely there's a nuance that hasn't been stated explicitly and that might be missing from the conversation so far.

philipc commented 1 year ago

I don't see any technical reason why the unwinder couldn't use the symbol table, but from a design perspective, this is something that the resolver is intended to do and already has code for, so I don't think it should be duplicated in the unwinder. I haven't seen a reason why the resolver can't be used in this case, but I haven't looked into the motivating use case (scoped-trace) at all.

philipc commented 1 year ago

While I think the resolver should be used for this purpose, I don't think it works correctly currently. Symbol::addr is documented to return the starting address of the function, and appears to do this for dbghelp, but it returns the unrelocated IP minus one for DWARF, and None for symbol tables.

workingjubilee commented 1 year ago

Returning None seems okay, at least, in the sense that it's useless but not wrong. But the DWARF response seems simply incorrect.

workingjubilee commented 1 year ago

This issue is no longer about Frame::symbol_address, which should probably remain untouched. Rather, it is about having a function that answers the desired use-case at all, is correct across platforms, and tries its alternatives until it succeeds or fails.

jswrenn commented 1 year ago

Yes, that sounds great. Again, for context: In the scoped-trace crate, I use symbol address equality to capture backtraces with limited upper and lower unwinding bounds. So I'd like to be able to call this function without doing full symbol resolution, or in situation where only symbol tables are available and not full DWARF debuginfo.