use of the same substitution for different template parameters is very hard to demangle

itanium-cxx-abi / cxx-abi

C++ ABI Summary

500 stars 93 forks source link

use of the same substitution for different template parameters is very hard to demangle #106

Open zygoloid opened 4 years ago

zygoloid commented 4 years ago

Example:

template<typename T> struct X { X(T f) { f(0); } template<typename ...U> void operator()(U...); };
template<typename T> void f(T&&) { X{[](auto &&, auto...){}}(); }
void g() { f(0); }

GCC, Clang, and EDG agree that the operator() mangling comes out as _ZN1XIZ1fIiEvOT_EUlS2_DpT0_E_EclIJEEEvDpT_, which no-one can demangle. At least three different things go wrong here:

1) The OT_ referring to the first parameter in the lambda-sig got rewritten to a substitution S2_. This completely breaks LLVM's demangling strategy, which rewrites template parameters (and substitutions) to the corresponding type as they're parsed, and so has no way to even represent a substitution that might mean two completely different things in different contexts, as happens here.

2) The Dp apparently confuses GCC's demangler, leaving it unable to see that T0_ means auto.

3) The (hidden by substitution) T_ and T0_ appear in a context where there is a level of template parameters in scope already. That confuses LLVM's demangler (but that seems like a comparatively straightforward bug).

My focus here is problem 1: allowing references to the lambda's implicit template parameter to be rewritten as a substitution referring to f's template parameter seems problematic. It's not clear whether the rules intend that, but it's at least what three different compilers do. That choice means that we can't expand substitutions as we parse during demangling -- we must preserve the original form of the substitution string and re-process it, because a T_ appearing within it can mean different things for different uses of the same substitution.

Perhaps distinct template parameters should never be considered the same for the purpose of forming substitutions, even if they have the same depth and index.

zygoloid commented 4 years ago

It looks like this problem manifests in non-lambda situations too. For example:

template<typename T> auto f(T&&) { struct A {}; return A(); }
template<typename T> auto g(T&&) { struct B {}; return B(); }
using A = decltype(f(0));
using B = decltype(g(0.0));
template<typename ...T> struct X {};
void h(X<A, B>) {}

produces the mangling _Z1h1XIJZ1fIiEDaOT_E1AZ1gIdEDaS2_E1BEE containing a backreference from g's T&& to f's T&&. libstdc++'s demangler produces:

h(X<f<int>(int&&)::A, g<double>(int&&)::B>)

and libc++abi's demangler produces:

h(X<auto f<int>(int&&)::A, auto g<double>(int&&)::B>)

... which are both wrong in the same way: the second int&& should be double&&, but to get that result, you need to re-expand the nodes referenced by S2_ with a different binding for T_.

zygoloid commented 4 years ago

I think on reflection this might be a bug in the various implementations. In particular, we say: "Note that substitutable components are the represented symbolic constructs, not their associated mangling character strings."

It seems to me that the template parameters of distinct template specializations (such as f's T and g's T in the second example) are distinct symbolic constructs, and the fact that template parameters with the same depth and index end up being treated as the same is an implementation detail that's incorrectly escaping into the manglings.

If that's right, then the first above example should mangle as _ZN1XIZ1fIiEvOT_EUlOT_DpT0_E_EclIJEEEvDpT_ and the second should mangle as _Z1h1XIJZ1fIiEDaOT_E1AZ1gIdEDaOT_E1BEE, both of which current demanglers are able to properly demangle.

zygoloid commented 4 years ago

What about function parameters? For example:

template<typename T> auto f(T a, decltype(a)) { struct A {}; return A(); }
template<typename T> auto g(T b, decltype(b)) { struct B {}; return B(); }
using A = decltype(f(0, 0));
using B = decltype(g(0.0, 0.0));
template<typename ...T> struct X {};
void h(X<A, B>) {}

is mangled by GCC and Clang as

_Z1h1XIJZ1fIiEDaT_DtfL1p_EE1AZ1gIdEDaS1_S2_E1BEE

and by ICC as

_Z1h1XIJZ1fIiEDaT_DtfL0p_EE1AZ1gIdEDaS1_S2_E1BEE

(the 1 vs 0 here seems like an unrelated issue). Should the S2_ instead be mangled as DtfL1p_E, given that symbolically the two fL1p_s have nothing to do with each other?

A demangler that chooses to demangle the above as

h(X<auto f<int>(int $p1, decltype($p1))::A, auto g<double>(int $p2, decltype($p2))::B>)

... would encounter exactly the same issues that we see for template parameters: the second decltype would presumably demangle as decltype($p1) rather than decltype($p2), because its substitution is a reference back to f's first parameter, not to g's first parameter.

rjmccall commented 4 years ago

The ABI has an example which clearly supports ICC about your unrelated issue:

template void f(T p, decltype(p)); // L = 1

I wonder if GCC and Clang are just failing to adjust L when mangling.

rjmccall commented 4 years ago

Should the S2_ instead be mangled as DtfL1p_E, given that symbolically the two fL1p_s have nothing to do with each other?

Well, I mean, they don't have nothing to do with each other. They're references to the same parameter, just being redundantly mangled. So I think the question here is whether that relationship is actually reasonable to ask demanglers to handle.

rjmccall commented 4 years ago

It seems to me that the template parameters of distinct template specializations (such as f's T and g's T in the second example) are distinct symbolic constructs, and the fact that template parameters with the same depth and index end up being treated as the same is an implementation detail that's incorrectly escaping into the manglings.

I agree that that's how it should work. I do worry about whether mangler implementations are going to end up having trouble with template redeclarations if we change this, though.

Also, fixing this is ABI-affecting. Do we think this case is marginal enough that this is actually acceptable to change?

zygoloid commented 4 years ago

Should the S2_ instead be mangled as DtfL1p_E, given that symbolically the two fL1p_s have nothing to do with each other?

Well, I mean, they don't have nothing to do with each other. They're references to the same parameter, just being redundantly mangled. So I think the question here is whether that relationship is actually reasonable to ask demanglers to handle.

One of them is the first parameter of f<int>, the other is the first parameter of g<double>. As with the template parameter case, I think substitutions are being used here because the parameters happen to have the same depth and index, but it seems like a big stretch to say that makes them be the same symbolic construct.

I do worry about whether mangler implementations are going to end up having trouble with template redeclarations if we change this, though.

I've implemented an approximation of this for Clang as follows: track for each substitution the innermost \<encoding> whose function/template parameters it refers to, and don't allow a substitution to be used once we leave its innermost referenced \<encoding>.

That's not entirely faithful to the idea that we use substitutions for the same symbolic construct (different \<encoding>s that refer to entities in the same scope won't be able to share substitutions), but it's a lot easier to implement than doing it correctly would be. I think it's also a lot easier to specify (a substitution that refers to a function or template parameter in an \<encoding> cannot be used outside of that \<encoding>). Maybe that's good enough?

Also, fixing this is ABI-affecting. Do we think this case is marginal enough that this is actually acceptable to change?

Well, I raised this issue because we hit this in practice -- one of our internal systems was unhappy that we were generating symbols we couldn't demangle (item 3 in the original comment), and while fixing that I noticed that the demangling was still wrong after fixing the demangler. The setup there was passing a local generic lambda to a function template that passed another local generic lambda to another template, resulting in both lambdas appearing in the mangling of the same symbol.

So it's not theoretical; such manglings do arise in practice. I've not seen any cases where this really matters for ABI reasons (where the same symbol would be generated in more than one translation unit), nor any cases where the identity of the symbol matters and duplicates with different manglings would be a real problem. But I would not discount the possibility that they're out there somewhere.

I think on balance using a rule that can be reasonably demangled is worth the risk of breaking something here. These cases are very uncommon right now but are going to become increasingly more common with increased adoption of generic lambdas and the use of lambdas in function signatures. Implementations could emit both symbols for a time if they're concerned (I think GCC does this sort of thing in some cases already).

zygoloid commented 4 years ago

I do worry about whether mangler implementations are going to end up having trouble with template redeclarations if we change this, though.

I've tried to implement this correctly (not merely approximately) -- that is, treat substitutions as symbolically different if they refer to function or template parameters of different entities. It does indeed seem to be difficult. It seems like we could either say:

1) Substitutions treat (function or template) parameters with the same depth and index as being the same, even if the actual parameter is unrelated -- this means that demanglers can't symbolically track substitutions, and will instead need to do re-demangling heroics (including handling the case where the substitution was expressed in a different way than the entity it's standing in for would have been). Hard for demanglers. 2) Substitutions treat references to the parameters of different functions / different template specializations as distinct -- this means that manglers can't use their existing notion of "same type" to track whether types are substitutable, as that notion will generally have false-positives when comparing types across scopes. 3) Some hybrid of the above.

Option 3 seems most promising to me. As noted above, my current foray into this space says that a substitution created within an \<encoding> can only be used within that same \<encoding> if it references a function or template parameter. That seems suitably easy to implement, and requires no changes to existing demanglers. If there's interest in that direction, I can try to gather some data on how common such manglings are.

If we don't want to risk an ABI change, then formalizing option 1 (the de facto current rule) and leaving demangler implementers with a headache seems like the path to take.

rjmccall commented 4 years ago

Wearing my Apple hat, I'm tentatively comfortable with an ABI change here. Apple's interest as a platform owner is in (1) maintaining binary compatibility with libc++ as a first-party library vendor and (2) allowing third parties to vend their own reasonable binary interfaces, and it doesn't seem like either of those is implicated here — I certainly hope users don't expect this kind of template declaration with lambdas and type inference to be a stable part of a library interface.

zygoloid commented 1 month ago

Another place where this comes us is when mangling requirements for member-like constrained friend templates:

template<typename T, typename U> concept C = true;
template<typename T, typename U> struct X {};

template<typename T> struct A {
  template<typename U> requires C<T, U> friend X<T, U> f(...) {}
};

void g(A<int> a) { f<float>(a); }

Clang mangles f as _ZF1fIfQ1CIT_TL0__EE1XIiS0_Ez. The S0_ substitution here refers abstractly to "template parameter depth 0 index 0", which is how Clang models U in the return type. That's also how Clang models T in the constraint, because the outer template parameters are in scope there. But a demangler should somehow know that the T_ in the constraint and the S0_ are referring to different template parameters. It would arguably be more in line with the ABI's requirement that we mangle using the symbolic identity of mangled entities to mangle this as _ZF1fIfQ1CIT_TL0__EE1XIiS1_Ez, with the S1_ referring to the TL0__ (which is the parameter U).

rjmccall commented 1 month ago

There's a case now with friend function templates where we need to be mangling the declaring type because friends of different types are different entities by the ODR, isn't there? Does this reliably fall into that case, or is it a more general problem? Or am I misremembering how that DR got resolved?

zygoloid commented 1 month ago

Yes, there is such a case, and it applies in the prior example (Clang currently gets this wrong for friend function templates, but we're working on fixing that -- the mangling of f should be _ZNAIiEF1fI...). This issue applies more generally, though:

template<typename T> concept C = true;

template<typename T> struct A {
  template<typename U> requires C<T> && C<U> void f(T, U) {}
};

void g(A<int> a) { a.f(0, 0); }

Clang mangles this as _ZN1AIiE1fIiQaa1CIT_E1CITL0__EEEviS2_. In the constraint, we have:

1CIT_E refers to T as T_; this is S2_
1CITL0__E refers to U as TL0__; this is S3_.

Then in the function parameters:

i is the substituted T
S2_ refers to U, but the substitution is a back-reference to the T from C<T> earlier, which has the same depth and index

rjmccall commented 1 month ago

I see. It seems wrong that U changes depth there after substitution of the outer template arguments, but I suspect that that would be much more deeply breaking.

zygoloid commented 1 month ago

If we could do everything all over again, it might make sense to number template depths from the inside out, not from the outside in, so that substituting outer levels (or ignoring them for friend declarations) doesn't change things that aren't dependent on the outer parameters -- and in particular, so that U would be at depth 0 in both the function parameter type and the constraints. But given where the implementations are, I don't think that's realistic.

Following on from my prior suggestion, here's a possible approach:

We define certain regions of the mangling as being distinct substitution scopes.
- Each encoding and generic lambda-sig has a distinct substitution scope.
- If an encoding has enclosing template parameters (beyond those on the function itself), then it has an additional substitution scope for its constraints; otherwise, the constraints are in the same substitution scope as the entity.
We define certain entities as being scoped entities:
- Entities that are referenced by depth and index (function parameters and template parameters).
- Scoped substitutions (see below).
We define substitutions as being scoped substitutions if they mention any scoped entities.
A substitution is only usable within a scope if either:
- It is not a scoped substitution, or
- It was created in the current substitution scope.
If a substitutable entity is to be mangled, and there is no usable substitution for it, then it is mangled normally (creating a new substitution).
The substitutions across all scopes are numbered using the same single sequential numbering.

So, broadly, we don't reuse substitutions when they contain a symbolic reference to a (function or template) parameter by depth and index unless we're in the same substitution scope where the substitution was created. This would purely be a compiler change: demanglers can continue to keep a flat list of substitutions.

I think that would address all the issues in this PR. It should also mean that compilers that use different notions of symbolic identity for function or template parameters (eg, depth and index versus de Bruijn indexes) should be able to use their own internal notions of "same type" for substitutions and still produce the same manglings as each other. It also seems pretty straightforward and efficient to implement, and shouldn't change any manglings that don't already run into problems where their substitutions don't make sense.

We would then mangle my previous example as _ZN1AIiE1fIiQaa1CIT_E1CITL0__EEEviT_. The first T_ is in the constraint substitution scope, and the second T_ is in the (distinct) substitution scope for the encoding.

For my second comment the mangling would change from _Z1h1XIJZ1fIiEDaOT_E1AZ1gIdEDaS2_E1BEE to _Z1h1XIJZ1fIiEDaOT_E1AZ1gIdEDaT_E1BEE because the two encodings have distinct substitution scopes, so the scoped substitution for T_ from the first one can't be reused in the second.