itanium-cxx-abi / cxx-abi

C++ ABI Summary
508 stars 96 forks source link

Proposal: Include an optional specification for mangling names that reference anonymous symbols #158

Open gmalecha opened 1 year ago

gmalecha commented 1 year ago

In some instances (see link below for some examples), it is very convenient to have stable names for symbols even when they have internal linkage. This is the case for anonymous types in C++. Discussion about extending the clang mangler suggested that I open an issue here to see about the feasibility of extending the specification with an optional specification for the mangling of names that reference anonymous types.

Here is a draft of some proposed changes to solicit feedback. This does not cover all possible issues, but it does cover many common issues. I can look into what it would require to make a formal PR if there is interest. Note that these should all be backwards compatible because they should be treated as opaque names.

Global Unnamed Structs

struct { int x; } y; // _Z2.y

Prefix the name of the first declaration, .y.

Anonymous enums

enum { X = 0 }; // _Z2.X

Use the name of the first enum value, .X.

Unnamed Structs inside aggregates

struct C {
  struct { int x; }; // _ZN1C2.xE
  struct { int y; }; // _ZN1C2.yE
};

Prefix the name of the first field with a ., e.g. .x. I think it is reasonable to repeat the . for each nested aggregate, but we could also length encode it.

struct C {
  struct { 
    struct { int x; }; // _ZN1C3..xE  or _ZN1C3.2xE
  };                   // _ZN1C2.xE
};
tahonermann commented 1 year ago

I previously worked on a product that required stable names for anonymous and unnamed types such as those listed in the prior comment as well as for template definitions (not for specializations, but rather for primary template definitions and partial specialization definitions). I would therefore support an effort to specify a mangling for such types (and for templates) for use in tools like static analyzers and profilers; tools that have a need for names that remain (reasonably) stable as the source code evolves over time.

zygoloid commented 1 year ago

Having a documented scheme that supports demangling of such names seems useful, even if it's not an ABI requirement.

I think it would be preferable to use $ rather than . in whatever scheme you suggest, if you need to use a vendor-specific mangling; while both characters are reserved for vendor use, per http://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-structure a . is currently used to separate a valid mangling from a vendor-specific suffix, and some tools split mangled names on .s before demangling them.

However, I think our starting point should be documenting the existing _ZL... mangling scheme for internal-linkage names, and seeing how far we can get with that.

gmalecha commented 1 year ago

Thanks for the feedback @zygoloid. Using $ seems perfectly fine to me.

However, I think our starting point should be documenting the existing _ZL... mangling scheme for internal-linkage names, and seeing how far we can get with that.

I assume that isn't documented anywhere, I don't know how I would learn about it. I am a bit worried about having a different mangling for a symbol based on whether it is internal or external linkage though. What is the motivation for that?

zygoloid commented 1 year ago

I am a bit worried about having a different mangling for a symbol based on whether it is internal or external linkage though. What is the motivation for that?

The motivation was to support the case where an internal-linkage symbol and an external-linkage symbol would otherwise have the same mangling and both be visible in the same translation unit / object file. I think all the cases where that could happen are no longer valid, but it used to be possible to do things like:

static void f(); // #1
void g() {
  f(); // call #1
  int f;
  {
    extern void f(); // #2
    f(); // call #2
  }
}

... where #1 would call an internal-linkage function and #2 would call an external-linkage function. (The language rule changed via a C++ Defect Report; GCC 10 and earlier and Clang 3.3 and earlier implement the old rule.)