handle RTTI type descriptors

froydnj commented 6 years ago

This pull request is a less ambitious version of #16, and is slightly nicer. Offering this one mostly as a basis for discussion for implementing the rest of the RTTI manglings, which are ~50% of the ~2000 symbols left to demangle in libxul. There's still some special-casing for the operator mangling, but I think that's unavoidable: in fact, it might even be reasonable to do:

enum Symbol {
  UserDefined(...),
  CompilerGenerated(...),
}

because the rules for mangling compiler-generated symbols are not nearly as uniform as user-defined ones, and I suspect that special handling will be required for all of them, in one form or another. This pull request doesn't add the requisite special handling when printing the name to match undname, but some sort of tricks would probably be necessary.

froydnj commented 6 years ago

Actually, I don't know that this scheme works out better everywhere. It obviously works out better for this particular RTTI thing, but it does not work so well for base class arrays and class hierarchy descriptors (where you have to read a name for the operator, and then the operator is entirely unscoped). I guess for those we can avoid reading scopes in read_name? But then you still have the problem with printing things...

froydnj commented 6 years ago

Same problem for base class descriptors, too. And I think complete object descriptors are weird enough to require their own special set of handling as well.

jrmuizel commented 6 years ago

I don't understand. Can't the normal scope handling handle the name?

e.g. for ?_R2A@@8
read_name() handles _R2A@@
 - read_unqualified_name() takes _R2 returning a RTTIBaseClassArray operator
 - read_scope() takes A@@ 

match c { handles the 8

jrmuizel commented 6 years ago

i.e. https://github.com/jrmuizel/msvc-demangler-rust/commit/19e3a6d69406d3516b768b4af42166ba99bb86ab seems to handle this just fine.

froydnj commented 6 years ago

I don't understand. Can't the normal scope handling handle the name?

Sure. But after you've read the operator + name for e.g. base class array, you are still underneath read_unqualified_name:

read_name
  read_unqualified_name
    read_operator

and when you come out of read_unqualified_name, read_name is going to try to read a scope from 8, which does not work very well.

You could instead just read an unqualified name for the base class array operator, and then have read_name handle the scope. I think that works out OK. I'm not sure that printing that would be straightforward, though I think you could do it. But that seems inelegant compared to how things are actually mangled.

jrmuizel commented 6 years ago

The name is not read as part of the operator for base class array. It's read as part of the scope. Do you have an example symbol that fails with the commit that I posted?

On Mon, May 21, 2018, 7:57 PM Nathan Froyd notifications@github.com wrote:

I don't understand. Can't the normal scope handling handle the name?

Sure. But after you've read the operator + name for e.g. base class array, you are still underneath read_unqualified_name:

read_name read_unqualified_name read_operator

and when you come out of read_unqualified_name, read_name is going to try to read a scope from 8, which does not work very well.

You could instead just read an unqualified name for the base class array operator, and then have read_name handle the scope. I think that works out OK. I'm not sure that printing that would be straightforward, though I think you could do it. But that seems inelegant compared to how things are actually mangled.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mstange/msvc-demangler-rust/pull/38#issuecomment-390820324, or mute the thread https://github.com/notifications/unsubscribe-auth/AAUTbbN6dE1VPvKnS-accTMAe1y1QIUfks5t01RYgaJpZM4UGTs4 .

froydnj commented 6 years ago

That seems like a bizarre way to structure the code, but it would work; I don't have any counterexamples.

jrmuizel commented 6 years ago

Why is it bizarre? Isn't the name a scope and so it makes sense to read it there?

jrmuizel commented 6 years ago

??_R2?4A@@8 is a test case that makes sure we don't parse operator names after _R2. undname gives A::5'::RTTI Base Class Array'

froydnj commented 6 years ago

??_R2?4A@@8 is a test case that makes sure we don't parse operator names after _R2. undname gives A::5'::RTTI Base Class Array'

It's not clear to me that just because undname demangles something, that something actually makes sense (i.e. would be emitted by the compiler). A base class array for a discriminator? Who writes that?

Why is it bizarre? Isn't the name a scope and so it makes sense to read it there?

Because something like LLVM's MSVC mangler doesn't emit a scope there; it emits a name and it's useful to be able to compare the mangler and demangler for symmetry. It'd be worth a comment, at least, saying why we were using a scope when the mangler uses a fully-qualified name.

jrmuizel commented 6 years ago

It's not clear to me that just because undname demangles something, that something actually makes sense (i.e. would be emitted by the compiler). A base class array for a discriminator? Who writes that?

For sure. However, these weird test cases can give insight into how undname does its parsing.

Because something like LLVM's MSVC mangler doesn't emit a scope there; it emits a name and it's useful to be able to compare the mangler and demangler for symmetry. It'd be worth a comment, at least, saying why we were using a scope when the mangler uses a fully-qualified name.

I believe LLVM's mangleName() ends up using mangleUnqualifiedName() in mangleNestedName() for each parent up the scope chain. Since it's the record declaration being passed to mangleName() in mangleCXXRTTIBaseClassArray() I picture it as though we're already on the first pass through mangleNestedName.

mstange / msvc-demangler-rust

handle RTTI type descriptors #38