orhun / binsider

Analyze ELF binaries like a boss 😼🕵️‍♂️
https://binsider.dev/
Apache License 2.0
2.76k stars 61 forks source link

Consider demangling symbols #67

Open HadrienG2 opened 1 month ago

HadrienG2 commented 1 month ago

Is your feature request related to a problem? Please describe.

Symbols from C++ and Rust programs (or any other AoT-compiled programming language that uses the Itanium name mangling ABI) can be quite hard to map back to source code without demangling, especially in the presence of generics.

Describe the solution you'd like

It would be nice if binsider had an on-by-default option to demangle Itanium ABI symbols. The option could be toggled either via CLI or via a TUI shortcut. I've been using cpp_demangle to this end in crofiler as a pure-rust solution and it worked pretty well, although it does have a few edge cases where it does not perfectly match libiberty.

Describe alternatives you've considered

Demangling can also be done in many other ways, such as by binding to libiberty or calling the c++filt utility. Alternatively, you may also decide that demangling is not worth the code complexity cost. Or you may not want to provide an option to disable it for UI simplicity. I've seen a few demangling hiccups in tools I use (especially perf), which is why I think it's good to have a way to turn it off.

If you do want to have demangling, another UI design option besides an on/off TUI shortcut would be to have two columns in the symbol table, one with the mangled name and one with the non-mangled name, but I think this table is already a bit crowded for that...

Additional context

Prior art of common ELF-wrangling tools that can perfom demangling and do so by default, with an option to disable it, includes the perf profiler and the GDB debugger.

orhun commented 1 month ago

Hello 👋🏼 Thanks for the suggestion! I think it very much makes sense :)

It would be nice if binsider had an on-by-default option to demangle Itanium ABI symbols. The option could be toggled either via CLI or via a TUI shortcut.

Yup, that's what I was thinking. Simply add another key binding (maybe m) for enabling/disabling mangling.

I've been using cpp_demangle to this end in crofiler as a pure-rust solution and it worked pretty well, although it does have a few edge cases where it does not perfectly match libiberty.

I haven't worked extensively with mangling libraries/tooling so I have some questions:

If you do want to have demangling, another UI design option besides an on/off TUI shortcut would be to have two columns in the symbol table, one with the mangled name and one with the non-mangled name, but I think this table is already a bit crowded for that...

Agreed.

orhun commented 1 month ago

Also, I found rustc_demangle.

HadrienG2 commented 1 month ago

I think you may find the name mangling wikipedia page worth a read. To summarize its key points:

To this, I can add that in the Real World, programs will have a mixture of mangled and non-mangled symbols, because you need non-mangled symbols for interop with C-minded infrastructure, e.g. linkers and loaders. You can handle this in various ways.

In crofiler I've went with dumb trial and error: try to demangle the symbol, it the demangler errors out keep the name as is. But crofiler only needed to support C++. In your case, since you're building a cross-language tool it may be better to identify the prefixes associated with various mangling schemes (e.g. _Z for C++ and _R for modern rustc) and dispatch to the appropriate demangler accordingly.

orhun commented 1 month ago

Thanks for the summary, it made everything more clear :)

But if you're targeting ELF binaries only, that may not be a major concern ?

I was actually planning to support more formats in the future, but that's shouldn't be a concern for now.

See #26 - I'd love to get your opinion on it as well.

Wikipedia page modern rustc uses a close cousin of the Itanium rules that has been tweaked to account for C++/Rust differences

Hmm, interesting. I found this RFC but not quite sure about the latest status of it.

I've went with dumb trial and error: try to demangle the symbol, it the demangler errors out keep the name as is. But crofiler only needed to support C++. In your case, since you're building a cross-language tool it may be better to identify the prefixes associated with various mangling schemes (e.g. _Z for C++ and _R for modern rustc) and dispatch to the appropriate demangler accordingly.

Yeah, sounds reasonable and I think that's the path that I will be taking.

HadrienG2 commented 1 month ago

See https://github.com/orhun/binsider/issues/26 - I'd love to get your opinion on it as well.

I'm afraid I'm not knowledgeable enough about binary file formats to evaluate how good this abstraction layer is :) If I knew more, my first questions would be...

All that being said, the QuarksLab company is quite reputable in the French security community, so it does give a good first impression from a future maintenance and expected feature-completeness perspective.

Wikipedia page modern rustc uses a close cousin of the Itanium rules that has been tweaked to account for C++/Rust differences

Hmm, interesting. I found this RFC but not quite sure about the latest status of it.

Indeed, I've just cross-checked a rust binary that I've built recently and the mangled symbols still start with _Z, so it seems to me that if this is merged into rustc, it may not be on-by-default yet. The wikipedia page may need some amending...

orhun commented 1 month ago

Those are good questions - I think we'll be able to answer them better after starting the implementation.

If it's a common subset approach (only include features that every format supports), do you think that will be good enough for your target audience? And how much functionality will you lose with respect to your current ELF-only approach?

That's my biggest concern, losing some data in the TUI due to the abstraction...

are you ready to handle the extra (optional) complexity in the UI?

Yeah.. or that..

From a Rust perspective, adding a C++ dependency adds some complexity to the build. Have you evaluated how much and are you fine with that?

It can't be worse than the issues that I'm having with Linux-specific dependencies (e.g. lurk-cli) :D

Are there other competing libraries that do a similar job? If so, how do they compare?

I will look into them later on. But either way, doing this for other file formats will require some abstraction. Thanks for sharing the links!