hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler
Other
5.24k stars 224 forks source link

[BUG] *= operator ambiguity with postfix * #1043

Closed aswaine closed 3 months ago

aswaine commented 3 months ago

I'm not sure if this is a bug, an observation or a suggestion...

Describe the bug a* =b assigns b to the thing pointed to by a. a*=b means a = a * b.

This doesn't occur with most other sets; a*>b and a* >b both compile and mean the same thing.

Technically &= has the same syntactic problem.

To Reproduce Steps to reproduce the behavior:

  1. Sample code - distilled down to minimal essentials please
    a: int = 1;
    b:= a&;
    c:= a*=2;
    d:= b* =2;
    e:= a*>3;
    f:= a* >3;
  1. Actual result/error (godbolt)
    int a {1}; 
    auto b {&a}; 
    auto c {a *= 2}; 
    auto d {*cpp2::impl::assert_not_null(b) = 2}; 
    auto e {cpp2::impl::cmp_greater(*cpp2::impl::assert_not_null(a),3)}; 
    auto f {cpp2::impl::cmp_greater(*cpp2::impl::assert_not_null(a),3)}; 

Additional context This is actually what I was expecting it to do, but it's broken a property of the c/cpp1 operator set, that you don't have to put spaces between operators to avoid ambiguity.

Option 1: live with it

With a context free grammar we can't disambiguate in the compiler. It's probably not going to lead to bugs because it should be caught by the compiler -- multiplication on a pointer makes no sense. And coding style can encourage spaces around =. But it feels destined to be a 'known gotcha', and be a thing that needs teaching about the language: put a space between * and = unless you really mean *=.

Option 2: change the operators

If more radical options are being considered:

I note on https://github.com/hsutter/cppfront/wiki/Design-note%3A-Postfix-unary-operators-vs-binary-operators that ^ was considered as a desirable alternative. I'm not an expert, but I think this might be possible if we're willing to rename the bitwise operators:

~ is pronounced "bitwise" (currently only used for bitwise NOT) Bitwise AND is renamed from & to ~& Bitwise OR is renamed from | to ~| Bitwise XOR is renamed from ^ to ~^ Bitwise NOT is renamed from ~ to ~! Dereference is renamed from * to ^ Reference remains & &=, |= and ^= become ~&=, ~|= and ~^=

* always means multiply (learning win). | becomes available for future syntax. ** also becomes available for exponentiation, which removes one thing to teach about the language -- by this point, it's probably a learning point that ** doesn't do exponentiation.

There's the obvious big downside of breaking consistency with other C-family languages, but arguably less than has already been made through changing unary operators to be postfix. And it is fairly rare that most people need to use bitwise operators, while it's easy to accidentally type a single & or | when a logical operator was intended -- this has the advantage of making bitwise operators stand out visually. The bitwise NOT operator is also maybe slightly more intuitive, although it starts to look weird to have ~! be postfix and ! prefix (which it has to be because of !=).

| is currently overloaded as a pipe operator in the ranges library, which would look weird spelled ~|, but I assume UFCS should make that go away.

JohelEGP commented 3 months ago

Max munch makes *= always the assignment operator, and anything else should be deference followed by assignment. See https://github.com/search?q=repo%3Ahsutter%2Fcppfront%20commenter%3Ahsutter%20max%20munch&type=issues.

hsutter commented 3 months ago

Thanks! I don't think this particularly a bug, as it 's max munch. However, I agree that languages should try to avoid max munch being a surprise. It's never actually ambiguous (and in general mistakes won't compile because of the type system), but I agree it can be possibly visually ambiguous.

So perhaps my confirmation bias is showing, but the most important phrase in the issue is this one:

This is actually what I was expecting it to do,

Great / whew! 😌

but it's broken a property of the c/cpp1 operator set, that you don't have to put spaces between operators to avoid ambiguity.

It's true that postfix * creates a new max munch visual ambiguity in cases of *= / * =. But it's not new... C and Cpp1 do already allow such examples, just not with *. For example:

This is actually a case that Cpp2 makes a little better, since there is only postfix ++, no prefix ++.

So Cpp2 does add the *= and &= cases, but it also removes some ++ and -- cases.

Does that make sense?

aswaine commented 3 months ago

Yes, that makes complete sense!