hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler
Other
5.24k stars 223 forks source link

[SUGGESTION] Ability to disable UFCS both via cppfront and for regions in source code #1004

Closed bluetarpmedia closed 3 weeks ago

bluetarpmedia commented 4 months ago

I'd like the ability to disable UFCS with 2 mechanisms:

  1. A new switch on cppfront to control whether UFCS is enabled (defaults to true) for a whole Cpp2 file
  2. Some kind of #pragma push/pop or [[attribute]]-style code that cppfront would read when parsing the Cpp2 source code to control whether UFCS function calls are enabled for a region of code inside a Cpp2 file

Motivations:

Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code? No

Will your feature suggestion automate or eliminate X% of current C++ guidance literature? No

MaxSagebaum commented 4 months ago

I can second this. Also the error messages if the lookup fails are usually not very intuitive.

hsutter commented 4 months ago

Thanks for raising this. I appreciate the feedback. The three motivations I see above are: implementation bugs, debugging experience, and error message readability.

A fourth one I'm curious about: Have folks encountered significant problems with actually invoking an unintended function?

That last one would especially be motivation to make UFCS be opt-in in the language via a distinct third function call syntax, which doesn't need to be too heavyweight... One simple candidate I've considered would be .., as in vec..ssize() for example:

And I don't think .. would necessarily get in the way of future evolution, because if there were a "ranges operator" I think ... still works well for that (e.g., 1...10, begin()...end()). I think ... as a ranges operator doesn't collide with pack expansions because it occurs in a different place in the grammar (it would be a binary expression, whereas so far pack expansion ... in an expression can only occur as a unary operator or in the place where an identifier would appear, if I recall right).

zaucy commented 4 months ago

I think I'd prefer an opt-out syntax as opposed to an opt-in syntax. Debugging/error messages can be improved over time as tools become cpp2 aware (I hope 🤞.)

Something like using this to say "I don't want to use UFCS" feels somewhat intuitive to me.

// cpp2
vec.this.size();

// lowers to cpp1 as (no UFCS)
vec.size();

To me vec.this reads as "vec access as this" only because this is a keyword.

JohelEGP commented 4 months ago

Another opt-in syntax is .:. See https://github.com/hsutter/cppfront/issues/741#issuecomment-1806845249.

bluetarpmedia commented 4 months ago

A fourth one I'm curious about: Have folks encountered significant problems with actually invoking an unintended function?

No, I haven't experienced that. I like how it dispatches currently especially with inheritance (code below). A scenario I think has been discussed elsewhere is that adding a member function to a type could change existing code's behaviour if the existing code is depending on UFCS calling a non-member function (of the same name), but that hasn't happened to me so far (though perhaps would only happen with enough elapsed time and code refactoring / library upgrades).

UFCS with inheritance ```cpp my_string1: type = { this: std::string; using std::string::string; } my_string2: type = { this: std::string; using std::string::string; } // Special behaviour for `my_string1` trim: (inout s : my_string1) = { s.clear(); } // Default behaviour trim: (inout s : std::string) = { s.erase( s.begin(), std::find_if(s.begin(), s.end(), :(ch: char) -> bool = !std::isspace(ch))); s.erase( std::find_if(s.rbegin(), s.rend(), :(ch: char) -> bool = !std::isspace(ch)).base(), s.end()); } main: () -> int = { str1: my_string1 = (" hello, world "); str1.trim(); std::cout << "Expect this to be empty: [(str1)$]\n"; str2: my_string2 = (" hello, world "); str2.trim(); std::cout << "Expect this to be trimmed: [(str2)$]\n"; return 0; } ``` [Godbolt](https://cpp2.godbolt.org/z/o34zaE8Tc)
hsutter commented 4 months ago

Thanks! So the issues really are: active bugs, debugging experience, and error message readability.

I think I can do something about at least two of those, and maybe all three, in the next month.

For example, for error message readability I was considering something like changing the UFCS implementation from

    if constexpr (requires{ CPP2_FORWARD(obj).CPP2_UFCS_REMPARENS QUALID TEMPKW __VA_ARGS__(CPP2_FORWARD(params)...); }) { \
        return CPP2_FORWARD(obj).CPP2_UFCS_REMPARENS QUALID TEMPKW __VA_ARGS__(CPP2_FORWARD(params)...); \
    } else { \
        return MVFWD(CPP2_UFCS_REMPARENS QUALID __VA_ARGS__)(CPP2_FORWARD(obj), CPP2_FORWARD(params)...); \
    } \

to make the else into an else if to get the constraint that the fallback is actually legal, and if neither is legal have a final else with a nice diagnostic that the call is invalid. That final else could so test whether the result would have been valid if obj had been an lvalue, and if so give a nicer error that it was the definite last use that caused the issue.

That's a sketch... how does that sound?


Edited to add: Something like this...

    if constexpr (requires{ CPP2_FORWARD(obj).CPP2_UFCS_REMPARENS QUALID TEMPKW __VA_ARGS__(CPP2_FORWARD(params)...); }) { \
        return CPP2_FORWARD(obj).CPP2_UFCS_REMPARENS QUALID TEMPKW __VA_ARGS__(CPP2_FORWARD(params)...); \
    } else if constexpr (requires{ MVFWD(CPP2_UFCS_REMPARENS QUALID __VA_ARGS__)(CPP2_FORWARD(obj), CPP2_FORWARD(params)...); }) { \
        return MVFWD(CPP2_UFCS_REMPARENS QUALID __VA_ARGS__)(CPP2_FORWARD(obj), CPP2_FORWARD(params)...); \
    } else { \
        if constexpr (requires{ obj.CPP2_UFCS_REMPARENS QUALID TEMPKW __VA_ARGS__(CPP2_FORWARD(params)...); }) { \
            static_assert( false, "member function call would have succeeded if obj were an lvalue - is this illegal because this is a definite last use of obj?" ); \
        } else if constexpr (requires{ MVFWD(CPP2_UFCS_REMPARENS QUALID __VA_ARGS__)(obj, CPP2_FORWARD(params)...); }) { \
            static_assert( false, "free function call would have succeeded if obj were an lvalue - is this illegal because this is a definite last use of obj?" ); \
        } \
        } else { \
            static_assert( false, "attempts to call obj.f() and f(obj) are both invalid - did you spell the function name correctly?" ); \
        } \
    } \
MaxSagebaum commented 3 months ago

That would really improve the situation.

A side note: Allowing to disable UFCS on a call by call basis, could help to find bugs in the implementation.

bluetarpmedia commented 3 months ago

Sounds good, I look forward to testing with that new version!

hsutter commented 3 months ago

Adding emphasis:

Thanks! So the issues really are: active bugs, debugging experience, and error message readability.

I think I can do something about at least two of those, and maybe all three, in the next month.

Update: I think I've improved the third one sufficiently (error message readability) with today's commit. Thanks for pushing on this!

Next up, the other two:

What UFCS bugs do you most commonly encounter? Just an example here in this thread would be fine, or an issue number link if it's already reported.

For the debugging experience, you wrote "because it doesn't step over the function calls like I'd expect" -- can you give an example? (I saw a reference to #844 but that seems to be a different case where the generated function doesn't have a source location in the original code at all, whereas with UFCS the call site is in the original code... unless it's in generated code of course but then that's the proximate issue rather than the UFCS?)

bluetarpmedia commented 3 months ago

What UFCS bugs do you most commonly encounter?

My wish list would be #999 and #1002 -- I think those are the ones I see most often.

because it doesn't step over the function calls like I'd expect

I'll make a GIF/video of this. The problem was that the debugger required 2 steps to step over a UFCS call, instead of 1 like you'd expect. Almost as if the first step was invisibly stepping into the macro.

hsutter commented 3 months ago

Thanks, a video would be good.

JohelEGP commented 3 months ago

I recall that it's stepping into the IILE first.

bluetarpmedia commented 3 months ago

I recall that it's stepping into the IILE

Yes, I think that's it. Here's a video showing the issue in VS 2022.

https://github.com/hsutter/cppfront/assets/2060747/21b1985f-47ed-44c7-aa06-59bf62b2cebf

gregmarr commented 3 months ago

I would suggest using the option to turn off line numbers, and then debug the generated code to see exactly what's happening. I suspect that it's multiple lines of code in cpp that happen to map to the same line in the cpp2.

MaxSagebaum commented 1 month ago

As shown in the comment https://github.com/hsutter/cppfront/pull/904#issuecomment-2150622367 UFCS macro/template can have a real impact on compile times.

name compile compile adjusted compile relative
base_03_cpp2_struct-compile 14.14 0 0
cleanup_00_base-compile 25.48 11.35 1
cleanup_01_remove_UFCS-compile 19.49 5.36 0.47

In this thread there have been sugestions for a new syntax:

hsutter commented 4 weeks ago

Thanks! Since even after the changes already made above UFCS still continues to be an issue, including that we now have more data UFCS is a compile-time performance issue, I'm trying out adding .. as a "really use only a member" opt-out. That leaves UFCS as the default, but provides a way to not use it when it's not wanted.

Note: If in the future we find that UFCS should not be the default, we can make . find members only, and .. be the UFCS syntax. That would be a breaking change, but a mechanical one.

hsutter commented 4 weeks ago

Xref: #1101

hsutter commented 3 weeks ago

With 9648191ecf0000696024bb54ecb8202373a922e9 we have the ability to use .. to select only a member (e.g., str..append("xyzzy");. And we also have the other improvements prompted by this issue, including removing some bugs and improving diagnostics. I don't like regions where the same code means different things based on language pragmas... so I'd like to avoid having x.f() sometimes be UFCS and sometimes not, and you have to look around in the context to know.

My impression is that the changes now done probably do address the motivation behind the request to have such regions, so I'll close this as completed. But feel free to reopen if you think there's a strong reason to reconsider more in this area, and even a lively debate about regions with different meanings for the same code!