hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler
Other
5.39k stars 232 forks source link

[BUG] Can't interpolate captured function call #838

Open JohelEGP opened 9 months ago

JohelEGP commented 9 months ago

Title: Can't interpolate function call.

Minimal reproducer (https://cpp2.godbolt.org/z/c7b34nGvo):

main: (args) = {
  _ = :() "<(args.size())$>";  // Cpp1 error: `args` isn't captured.
  _ = :() "<(args.size()$)$>"; // Cpp2 error
}

Commands: ```bash cppfront main.cpp2 clang++18 -std=c++23 -stdlib=libc++ -lc++abi -pedantic-errors -Wall -Wextra -Wconversion -Werror=unused-result -I . main.cpp ```

Expected result: The second interpolation to work.

Actual result and error: main.cpp2(3,27): error: no matching ( for string interpolation ending in )$

gregmarr commented 9 months ago

Looks like the issue here is that you're trying to do a lambda capture inside a string interpolation. That seems like it would be difficult at best to work out whether the $ was for the lambda capture or the string interpolation. I think I would be fine with saying that you can't do a lambda capture inside the string interpolation, you need to do the lambda capture first and then use the captured value in the string interpolation.

jcanizales commented 9 months ago

If you want to capture args, wouldn't you do this instead? "<(args$.size())$>"

JohelEGP commented 9 months ago

That copies args.

hsutter commented 7 months ago

Thanks for waiting. The $ capture applies to the immediate context, in this case the string. So if you want the function expression to capture args by pointer (reference), you can write that in the body of the function:

_ = :() -> _ = { a := args&$; return "<(a*.size())$>"; };

Does that resolve the question for you?

JohelEGP commented 7 months ago

This works, too (https://cpp2.godbolt.org/z/4sTxqjW3G):

  _ = :() "<(args&$*.size())$>"; // Works

The problem seems to be the )$)$ in :() "<(args.size()$)$>". Is this due to the grammar, just like #861?

hsutter commented 7 months ago

Interesting... my mental model for capture (including interpolation) has always been that it's referring to capture in the immediate context, here the string. This example is very interesting, because maybe it's pointing out that actually the model allows an interpolation to also mention a capture in the enclosing scope, by putting it in the interior of the interpolation... when the interpolation is applied, the implementation breaks the string into pieces... one of which contains an embedded interpolation. On thinking about it, that may make sense, but it was nonobvious to me that would fall out. Thanks for pointing this out!

hsutter commented 7 months ago

More: It's accidental that your example works... I did not intend to support nested captures like that.

It happens to work because the code currently scans forward for the next )$ and then backtracks to the matching ( and puts everything inside into its own expression to be evaluated.

For this case:

_ = :() "<(args&$*.size())$>"; // happens to work

it appears to "work" because it just so happens that the expression contains a capture that is preserved when the string is broken into pieces... because there is no nested )$ in the interpolation expression.

But for this case:

_ = :() "<(args.size()$)$>"; // doesn't work

we find the first )$ which appears to be an empty ()$ capture. Then when we find the second )$ it appears to have no matching opening ( and so that is reported as an error.

My (probably obvious) first reaction is to say "well, then scan for )$ starting at the back instead," which is doable if we really want to support such nested captures in interpolations only -- but (a) we don't for regular captures, so there's a consistency problem, and (b) I would want to work through more examples to see whether this would create visual ambiguities for the programmer (humans need to mentally parse this too... do we want to require the human to inspect the string from the end too?).

My second reaction is to regain consistency by actually rejecting both cases, including emitting an error on the first one by not allowing an interpolation to contain another $. I'm actually inclined to do that... what do you think?

JohelEGP commented 7 months ago

Maybe rejecting them makes sense (and I'm fine with it, in this case), but I'm not sure. I've been trying to build an intuition on nested captures. With https://github.com/hsutter/cppfront/issues/861#issuecomment-1860953624, I have a better understanding on how cppfront treats interpolations despite the grammar. For nested captures, I've done some exploration (see the last item of the list below).

At which points does having a nested $ becomes OK? I'm wondering if all these cases should be rejected, too (for consistency, too?):

JohelEGP commented 7 months ago

_ = :() "<(args.size()$)$>";

_ = :() "<(args&$*.size())$>";

My second reaction is to regain consistency by actually rejecting both cases, including emitting an error on the first one by not allowing an interpolation to contain another $. I'm actually inclined to do that... what do you think?

Maybe rejecting them makes sense (and I'm fine with it, in this case), but I'm not sure.

Now I'm more sure that these aren't nested captures and shouldn't be rejected.

My view is that string interpolation and a FE capture are similar operations, yet fundamentally different, that share a syntax (in the same way the name: signature = initializer syntax can be used to declare different kind of entities, as recently discussed at https://github.com/hsutter/cppfront/discussions/714#discussioncomment-7997235). Once you have a string interpolation, its contents is just an expression. This expression can itself have FE captures (although things start breaking down if it wants to nest an interpolation, as recently discussed at #861). Rejecting this feature combination seems like an arbitrary limitation to me.

_ = :() "<(args.size()$)$>"; To accept this, perhaps we could permit a space: _ = :() "<(args.size() $)$>";. This starts smelling like C++98's > > vs. C++11's >> nested template-ids.

JohelEGP commented 7 months ago

With regards to my expectations of #838 and #861, I think the grammar of string-literal isn't honest. The lex phase splits the source stream into tokens. But an interpolation includes expression, which suggest that the parse phase drives the lex phase. What actually happens is more like the handling of a metafunction, but moved from the parse phase to the lex phase. Whenever )$ is found in a string, it backtracks to a ( and lexes the contents. So it seems like an interpolation shouldn't be presented as grammar, but closer to how a metafunction is presented.