itanium-cxx-abi / cxx-abi

C++ ABI Summary
508 stars 96 forks source link

Mangled Union Initializer in a Template parameter fails if the 'field' is blank. #139

Open erichkeane opened 2 years ago

erichkeane commented 2 years ago

I'm reasonably convinced this is an issue in the mangling scheme with a union initializer. I see that GCC has the same problem as Clang does in; https://github.com/llvm/llvm-project/issues/54588

The issue, in summary, is that trying to mangle the initializer expression to: dummy<wrapper<int> { 42 }>(); which initializes the 'unnamed struct' field of the union.

The appropriate mangling section (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangle.braced-expression) has us using this production: ::= di <field [source-name])> <[braced-expression]> # .name = expr

However, in this case, we cannot mangle a source-name, as one does not exist!

For reference, ote that if I change the struct to have a field-name (FOO), we get: _Z5dummyIXtl7wrapperIiEtlNS1_Ut_Edi3FOOtlNS2_Ut_ELi42EEEEEEiv which demangles to : int dummy<wrapper{wrapper::'unnamed'{.FOO = wrapper::'unnamed'::'unnamed'{42}}}>()

So, what are we expected to do here? Should we omit the name entirely and just use the <expression> version here? Or is this a case where we should still do the di and use a different production for the field name? I note that we have an 'unnamed-type-name' production, but nothing like that for a 'source-name'.

rjmccall commented 2 years ago

Is there a reason not to just look through anonymous structs and unions and treat the inner member as if it were a direct member of the containing class?

erichkeane commented 2 years ago

I responded on the Clang bug (https://github.com/llvm/llvm-project/issues/54588) with what I believe is your suggestion. I think that is alright, though I presume we would want some level of ABI-mangling-guarantee that this is how you guys want to handle this.

jicama commented 2 years ago

Is there a reason not to just look through anonymous structs and unions and treat the inner member as if it were a direct member of the containing class?

This makes sense to me.

erichkeane commented 2 years ago

The only concern I have to that approach (assuming I understand it properly?) is that we might end up with duplicate field names that way. So something like:

union {
  struct {
    T val;
    struct {
      T val;
      T val2;
    };
};

then: wrapper<int>{42,44,45} Ends up getting mangled with val=int{42},val=int{44},val=int{45}. Presuming we mandate the order be declaration order (and perhaps depth-first descent?) in this way though, I don't suspect it would be a problem.

I have no idea how to write a production to say that however.

rjmccall commented 2 years ago

Is there a situation where that's legal? Members of anonymous unions are required to be tested for redeclaration in the enclosing scope ([class.union.anon]p1), and I would hope that compilers all use the same rule for anonymous structs.

erichkeane commented 2 years ago

Ah! Seemingly not. I ran an experiment at work that made me think this was going to be a problem (which had surprised me at the time), but I must have made a mistake when putting it together around the crashing compilers.

SO I guess I have no complaints about the proposal. How do we formalize it in the ABI?

rjmccall commented 2 years ago

Oh wait, you know, I think I was answering a different question.

The ABI does seem to be missing wording about how to mangle e.g. member expressions that resolve to an anonymous union member. One could argue that it's covered by the rule to traverse the syntactic expression tree and not expand e.g. implicit conversions, and indeed that's what compilers all seem to do, but still, it's better to be precise. To do that, we'd just add a sentence or two to the section about anonymous entities.

But you're asking about how to mangle an initializer which has resolved against the anonymous struct itself. I think there's some confusion here, because the ABI does not say that you're supposed to mangle the name of the initialized member when mangling a braced-init-list. The general rule is that the mangling grammar follows the syntactic tree, and that's the intent for braced-expression. di is for when the source includes a member designator (an extension prior to C++20, and still an extension if nested). So wrapper { 42 } should be mangled as tl7wrapperLi42EE, with no attempt to mangle the path down to the member being initialized. And of course that means we don't have to worry about designators that name anonymous aggregates. (A designator that names an anonymous aggregate member would be mangled as the simple name of that member, just as written in source.)

To be clear, that is true in general, not just for initializers that happen to resolve against anonymous aggregates.

erichkeane commented 2 years ago

I see, ok then. It seems that the implementations are both doing it wrong by assuming that the union-case should always include the member initialize implicitly in this case. Though, I guess I don't see why a more simple case of otherwise equivalent instantiations that are otherwise equivalent (besides the syntax used to initialize it) should have different symbols.

rjmccall commented 2 years ago

Well, in general, the mangling distinguishes a large number of semantically-equivalent expressions. So e.g. we mangle (*p).member) differently fromp->membereven when we know thatp` has pointer type. There's only so much we can do.

But you're right, I am once again doing this wrong. The mangling for non-type template arguments does need to be independent of spelling and is controlled by https://github.com/itanium-cxx-abi/cxx-abi/issues/63, which has unfortunately not yet been written up as a PR for inclusion in the ABI document. That rule says that we have to mangle the active member of a union, and that's what's blowing up here, where the active member of a union is an anonymous aggregate.

The section on Anonymous Entities says that:

For the purposes of mangling, the name of an anonymous union is considered to be the name of the first named data member found by a pre-order, depth-first, declaration-order walk of the data members of the anonymous union.

So I think that's the correct rule to use here. In the original example from the Clang bug, the name for mangling purposes of the active member of the anonymous union would be val, since that is the first named data member found by a walk of the active member (the anonymous struct).

Since technically this is already covered by the ABI, there's no need for a change. However, I'm going to go ahead and write up a PR to integrate Issue #63 into the document, and I'll cover this explicitly there.

erichkeane commented 2 years ago

Ok, thank you for the clarification, ~I think I have enough to fix this in Clang then~. I'll make sure to add you to the review when I get around to implementing it.

EDIT: Actually, what about this case:

template <typename T>
class wrapper {
public:
    union {
        struct {
            T val;
                        T val2;
        };
    };
};

template <auto tparam> int dummy() { return 0; }

int main() {
    dummy<wrapper<int> {}>(); // succeeds due to "isZeroInitialized()"
    dummy<wrapper<int> { 42,43 }>();
    dummy<wrapper<int> { 42,44 }>();
    return 0;
}

Those two should be mangled differently, right? But the "For the purposes of mangling, the name of an anonymous union is considered to be the name of the first named data member found by a pre-order, depth-first, declaration-order walk of the data members of the anonymous union." rule you quoted above would ahve them both be val=42, right?

rjmccall commented 2 years ago

The active member of the union is the same in both of those cases: it's the member that contains the anonymous struct. The name of that member for mangling purposes is val.

I believe the way this is supposed to be mangled according to #63 is something like val = <anonymous struct for val> {42, x}, where x is the second element. I'm not sure what the mangling for an anonymous type is meant to be; if I understand correctly, the rule in Anonymous Entities is for the mangling for the declared object, not of the anonymous type. In any case, arguably the direct-initialization is unnecessary and it ought to just be val = {42, x}.

erichkeane commented 2 years ago

Ah! Thanks and sorry for my confusion. I missed that the DFS was ONLY for the name, not the entire initialization.

Thanks!

erichkeane commented 2 years ago

Sorry, 1 more question! In this case, do we want to enumerate the bases? So given:

struct base {
  int FIRST;
};

....
union {
  struct : base {
    int SECOND;
  };
};

Do we name it "FIRST" or "SECOND"? Do we consider VIRTUAL bases, or just 'normal' bases?

One thing to note: I don't think the Clang AST keeps the order of virtual and normal bases, so any sort of 'declaration order' on bases is likely a more sizeable change for us in this regard.

rjmccall commented 2 years ago

I don't think anonymous structs are supposed to be able to have base classes; that seems like an oversight on Clang's behalf. GCC does not allow that. Clang also does not implement it "correctly"; the members of the base are not injected into the enclosing scope.

I don't think that question can come up otherwise; the traversal should never descend into a named class because it would have to be a named member.

erichkeane commented 2 years ago

I'm not sure about the standard, I haven't really looked into it very closely in this regard, however I note that GCC is the only of the 'big 4' that enforce this : https://godbolt.org/z/EG3nEnMrn

That said, I think you've answered the question that we don't need to look into bases.