oxidecomputer / helios

Helios: Or, a Vision in a Dream. A Fragment.
Mozilla Public License 2.0
364 stars 10 forks source link

Figure out pstack and Rust strip = "symbols" interaction #147

Open sunshowers opened 6 months ago

sunshowers commented 6 months ago

(writing up a quick summary so I don't forget about this)

@jclulow reported to me that a hung nextest process didn't show its symbols in pstack. It turned out that was because the build process we were using within nextest was stripping all symbols.

I tried switching to strip = "debuginfo" but that didn't seem to help. The only thing that caused stacks to show up in pstack was strip = "none".

In https://github.com/nextest-rs/nextest/commit/d4f982b3184f07ff5c40cc90c52d3fc6567be0b9#commitcomment-140289873, Taiki believes that this is a bug (there was apparently a similar bug in MSVC as well.) I tried setting up a small project to investigate:

https://github.com/sunshowers/pstack-test

but found that even with the strip-symbols profile (which activates strip = "symbols"), pstack could show function symbols. Not quite clear why that would be happening though!

taiki-e commented 6 months ago

but found that even with the strip-symbols profile (which activates strip = "symbols"), pstack could show function symbols. Not quite clear why that would be happening though!

Does this mean there is a problem only with strip=debuginfo?

Also, is this a problem that only occurs with cross-compilation?

If both are true, since only strip=debuginfo use /usr/bin/strip, that would be consistent with my guess in https://github.com/nextest-rs/nextest/commit/d4f982b3184f07ff5c40cc90c52d3fc6567be0b9#commitcomment-140325483 that non-illumos host's /usr/bin/strip are causing some problems.

sunshowers commented 6 months ago

Does this mean there is a problem only with strip=debuginfo?

Ah no, I mean that no matter what I set strip to, I couldn't reproduce the results.

I do think this is likely a cross-compilation issue as you pointed out -- I was doing my builds natively on illumos but the CI that runs strip was doing a cross-compile from Linux.

sunshowers commented 6 months ago

@taiki-e -- so what can we do here? Is this something where rustc can help, or maybe cross?

taiki-e commented 6 months ago

For now, I think it makes sense to wait for discussion at the next compiler-team meeting (https://github.com/rust-lang/rust/issues/123151#issuecomment-2025900827).

If the decision is made to use llvm-strip, the fix will likely be backported to stable (as it also affects macOS, which is tier 1 target).

workingjubilee commented 6 months ago

Huh, reading this: https://illumos.org/books/dev/debugging.html

Let's take a look now at pstack. As long as sufficient information for debugging is present in a binary, pstack can tell you what's going on. pstack doesn't rely on DWARF, it simply needs to have access to the symbol table and a frame pointer. If it's software in illumos, you can rest assured that it comes this way by default.

pstack doesn't rely on DWARF? oh dear... I think our backtrace testing infra assumes that you're using the DWARF-centric unwinding and symbolication strategy if you're a Unix...