nico / demumble

A better c++filt and a better undname.exe, in one binary.
Apache License 2.0
339 stars 50 forks source link

Demangle RTTI class names #9

Open stevemk14ebr opened 5 years ago

stevemk14ebr commented 5 years ago

RTTI class names that start with .?AV or .?AU (class/struct) are not demangled. This can be fixed by stripping the symbol prefix for RTTI and replacing it with the C++ class prefix.

Example: .?AVCNetMidLayer@@ -> ??0CNetMidLayer@@QAE@XZ

From: https://reverseengineering.stackexchange.com/questions/20516/how-can-i-demangle-the-name-in-an-rtti-type-descriptor and https://github.com/REhints/HexRaysCodeXplorer/blob/5be89aa1d32eeaefb099b838ee5622200eb8a2e9/src/HexRaysCodeXplorer/ObjectExplorer.h#L80

nico commented 5 years ago

Thanks for the report.

I think the .AV... bit is written by mangleCXXRTTIName() (currently at http://llvm-cs.pcc.me.uk/tools/clang/lib/AST/MicrosoftMangle.cpp#3129), which mangles the contents of the RTTI descriptor (i.e. it's not really a symbol, but data in the symbol whose name is computed by mangleCXXRTTI() a bit further down in that file).

undname also can't demangle this (but there's no reason not to do better, of course):

C:\src\demumble>undname .?AVCNetMidLayer@@
Microsoft (R) C++ Name Undecorator
Copyright (C) Microsoft Corporation. All rights reserved.

Undecoration of :- ".?AVCNetMidLayer@@"
is :- ".?AVCNetMidLayer@@"

For Itanium symbols, we're able to demangle both name of the struct and the contents:

C:\src\llvm-mono>..\demumble\demumble.exe _ZTI1H
typeinfo for H

C:\src\llvm-mono>..\demumble\demumble.exe _ZTSFviE
typeinfo name for void (int)

This suggests we should support this for the Microsoft ABI too.

nico commented 5 years ago

One problem is that this makes it a bit hard to detect a mangled string. At the moment, we can look for "?" as prefix on Win and for "_Z" on Itanium.

With this, the prefix on Win can be ".?..." for a tag type (typeof(MyClass)), ".$$B..." for an array type (typeof(MyClass[4])), ".N" (and other built-in type codes) for a built-in type (typeof(double)) -- we basically have to look for a mangled type after every period in the input.

nico commented 5 years ago

With ad8745b2219 (not production quality) applied locally:

$ buildmac/demumble ".?AVCNetMidLayer@@"
typeinfo name for class CNetMidLayer
nico commented 5 years ago

Upstream bit: https://reviews.llvm.org/D67851

stevemk14ebr commented 5 years ago

Thank you for working on this

nico commented 5 years ago

Trunk now demangles rtti descriptor names when you pass them directly:

$ ./demumble .?AVCNetMidLayer@@
class CNetMidLayer `RTTI Type Descriptor Name'

Adding it in streaming mode (echo .?AVCNetMidLayer@@ | ./demumble) is a bit tricky to do since . is such a common character. If I do what's in ad8745b221916 , then echo ._Z1fv | ./demumble goes from .f() to ._Z1fv because the . now triggers an MS demangling attempt (because "_Z1fv" by themselves are all valid ms mangling chars), and on demangling fail demumble currently prints the whole candidate string and advances.

I could make it so that on demangling fail, we consume just one char instead or something.

Not supporting this in streaming mode at all isn't super unreasonable either imho, since that's what we do for itanium type manglings ("Pi").

But eventually I'll probably want to do the smarter backtracking -- it should fire rarely enough that it shouldn't affect perf much.