rubrikinc / wachy

A UI for eBPF-based performance debugging
https://rubrikinc.github.io/wachy/
Other
551 stars 17 forks source link

Wachy panics trying to load Rust binary. #8

Open tobz opened 2 years ago

tobz commented 2 years ago

Context

toby@foo:~/src/wachy$ rustc -V
rustc 1.62.0 (a8314ef7d 2022-06-27)

toby@foo:~/src/wachy$ target/release/wachy -V
wachy 0.1.0-alpha.6

toby@foo:~/src/wachy$ bpftrace -V
bpftrace v0.14.0

toby@foo:+/src/wachy$ file ~/src/vector/target/release/vector
/home/toby/src/vector/target/release/vector: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=85a6094a204866a55d1ce1d43ef267036612ea0f, for GNU/Linux 3.2.0, with debug_info, not stripped

toby@foo:+/src/wachy$ WACHY_LOG=trace sudo -E target/release/wachy ~/src/vector/target/release/vector "vector::internal_telemetry::allocations::Producer<_,_,T>::write"
Error: Panic! [v0.1.0-alpha.6]
Cause: panicked at 'called `Option::unwrap()` on a `None` value', src/controller.rs:443:53

No matter whether or not I run as sudo, older versions of bpftrace (tried v0.11.x), it seems to throw this panic. It's a bit hard to grok what it's doing at that location so I didn't have any great immediate ideas for debugging further, given that it seems to find the symbol.

viveksjain commented 2 years ago

The code is rather lacking in error messages and comments, sorry :). I believe this line has found a call instruction inside the function and is trying to map it back to the source file/line. file output says it has debug symbols but I'm not sure if something is getting partially stripped, or perhaps you depend on some library which is getting inlined and doesn't have symbols? Does it fare better with a debug build of vector? Alternatively, could you provide instructions on how to repro (i.e. how to build vector) and I can try to take a look.

tobz commented 2 years ago

I guess it didn't necessarily occur to me to try it with an honest-to-god debug build, so let me give that a shot.

Beyond that, maybe there's some inlining happening from the outside? I'll also try and track that down as well.

tobz commented 2 years ago

Alright, had a chance to try this out.

I did a normal debug build, and then a release build where debug = true in the Cargo release profile:

toby@foo:~/src/wachy$ file ~/src/vector/target/release/vector
/home/toby/src/vector/target/release/vector: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=65c02f84ccab09cfba9370899b5d05a26191b9d7, for GNU/Linux 3.2.0, with debug_info, not stripped
toby@foo:~/src/wachy$ file ~/src/vector/target/debug/vector
/home/toby/src/vector/target/debug/vector: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=71ef74b7d56745b8a20794530ff5bef1afe56e6f, for GNU/Linux 3.2.0, with debug_info, not stripped

Using the same command as in the PR description, the debug build gets me to the UI where I can select which specific matching symbol I want, etc. The release build panics before even opening the UI. I do get an error about the bpftrace command syntax missing a brace or something, but I still consider this progress. 😄

Now, I guess my thought is: how do I try to figure out what is missing that Wachy needs? Any sleuthing I could try and do with nm? I can also give you instructions to build Vector -- that's easy -- but I don't want to take up a bunch of your time building and debugging Vector, etc.

viveksjain commented 2 years ago

Hmm can you post the bpftrace syntax error? That's probably a separate issue, alpha.6 will already contain a fix for #2 but maybe there's more things to fix.

So it seems the original issue is related to release build, although I'm not sure why that's the case if you have debug=true. Unfortunately this part of wachy doesn't have good logs, it's probably easiest for me to debug independently. But the error is in address-to-line mappings, what I'd probably do is add logs to figure out which address/ip it's crashing at, then check gdb disassemble /s <mangled-symbol-name> corresponding to vector::internal_telemetry::allocations::Producer<_,_,T>::write, and then…just try to find clues to what's going on (did some function get inlined?). I reckon the answer will be related to how rustc works but it requires some digging.