rust-lang / backtrace-rs

Backtraces in Rust
https://docs.rs/backtrace
Other
524 stars 240 forks source link

Rewrite msvc backtrace support to be much faster on 64-bit platforms #569

Closed wesleywiser closed 10 months ago

wesleywiser commented 10 months ago

Currently, capturing the stack backtrace is done on Windows by calling into dbghelp!StackWalkEx (or dbghelp!StackWalk64 if the version of dbghelp we loaded is too old to contain that function). This is very convenient since StackWalkEx handles everything for us but there are two issues with doing so:

  1. dbghelp is not safe to use from multiple threads at the same time so all calls into it must be serialized.
  2. StackWalkEx returns inlined frames as if they were regular stack frames which requires loading debug info just to walk the stack. As a result, simply capturing a backtrace without resolving it is much more expensive on Windows than *nix.

This change rewrites our Windows support to call RtlVirtualUnwind instead on platforms which support this API (x86_64 and aarch64). This API walks the actual (ie, not inlined) stack frames so it does not require loading any debug info and is significantly faster. For platforms that do not support RtlVirtualUnwind (ie, i686), we fall back to the current implementation which calls into dbghelp.

To recover the inlined frame information when we are asked to resolve symbols, we use SymAddrIncludeInlineTrace to load debug info and detect inlined frames and then SymQueryInlineTrace to get the appropriate inline context to resolve them.

The result is significant performance improvements to backtrace capture and symbolizing on Windows!

Before:

> cargo +nightly bench
     Running benches\benchmarks.rs

running 6 tests
test new                                 ... bench:     658,652 ns/iter (+/- 30,741)
test new_unresolved                      ... bench:     343,240 ns/iter (+/- 13,108)
test new_unresolved_and_resolve_separate ... bench:     648,890 ns/iter (+/- 31,651)
test trace                               ... bench:     304,815 ns/iter (+/- 19,633)
test trace_and_resolve_callback          ... bench:     463,645 ns/iter (+/- 12,893)
test trace_and_resolve_separate          ... bench:     474,290 ns/iter (+/- 73,858)

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 0 filtered out; finished in 8.26s

After:

> cargo +nightly bench
     Running benches\benchmarks.rs

running 6 tests
test new                                 ... bench:     495,468 ns/iter (+/- 31,215)
test new_unresolved                      ... bench:       1,241 ns/iter (+/- 251)
test new_unresolved_and_resolve_separate ... bench:     436,730 ns/iter (+/- 32,482)
test trace                               ... bench:         850 ns/iter (+/- 162)
test trace_and_resolve_callback          ... bench:     410,790 ns/iter (+/- 19,424)
test trace_and_resolve_separate          ... bench:     408,090 ns/iter (+/- 29,324)

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 0 filtered out; finished in 7.02s

The changes to the symbolize step also allow us to report inlined frames when resolving from just an instruction address which was not previously possible.

github-actions[bot] commented 10 months ago

Code size changes for a hello-world Rust program linked with libstd with backtrace:

On platform ubuntu-latest:

On platform windows-latest:

github-actions[bot] commented 10 months ago

Code size changes for a hello-world Rust program linked with libstd with backtrace:

On platform ubuntu-latest:

On platform windows-latest:

github-actions[bot] commented 10 months ago

Code size changes for a hello-world Rust program linked with libstd with backtrace:

On platform ubuntu-latest:

On platform windows-latest:

github-actions[bot] commented 10 months ago

Code size changes for a hello-world Rust program linked with libstd with backtrace:

On platform ubuntu-latest:

On platform windows-latest:

wesleywiser commented 10 months ago

This will also fix https://github.com/rust-lang/rust/issues/116403 by virtue of not using dbghelp.dll during the initial backtrace capture.