hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler

Other

5.48k stars 242 forks source link

[BUG] `==` and `=` in declarations are confusing. #824

Open msadeqhe opened 11 months ago

msadeqhe commented 11 months ago

Description

Considering @hsutter's comment from discussion #623:

Thanks! I think I've answered this, and addressed it for <<'s interactions with comparisons, in the comment thread starting here: #817 (comment)

Briefly: It's reasonable for << with its built-in bitwise meaning to be high-precedence, but in the 1990s overuse of overloading was popular, including its use for streaming I/O which inherently wants to be low-precedence but can't change the actual operator's precedence... and that overloaded use of << became popular so we keep having this surprise. In fact, I think << is the poster child for precedence issues, because it's the major case that invented a later use of the operator for something very different, that became very popular.

Operator overloading is fine, but I view << for streaming as a warning example about why we shouldn't overuse the feature, and when we do use it we should closely follow the built-in operators' meanings... as Scott Meyers would say (and did), "do as the ints do."

The part "we should closely follow the built-in operators' meanings" is important here.

To Reproduce

Notation `==`

Considering type and namespace aliases:

v32: type == std::vector<i32>;

A typical C++ programmer expects == to return the result of a comparison, but it sets v32 type definition from another type instead. Therefore == is an assignment operator in declarations, but it is a comparison operator in expressions.

Considering constexpr functions and variables, == is more confusing:

fnc1: () -> bool == (x == y); // =='s are not the same!
fnc2: () -> bool == x == y; // Should it be allowed?!

The first == sets fnc1 function definition, but the second == compares two literals and returns the result. A typical programmer may ask why it's happening while they are next to each other.

Considering concept declarations in Cpp2:

arithmetic: <T> concept = std::integral<T> || std::floating_point<T>;

Unlike types and namespaces, it uses = instead of ==. Cpp2 is/was designed for C++ programmers of which they are more familiar with declaring stuff by = than == in Cpp1:

using v32 = std::vector<i32>;
template<class T> concept yes = true;

Notation `=`

Considering function declaration syntax:

fnc1: () = {}
fnc2: () = something();

If we ask programmers who are not familiar with Cpp2 that "What does fnc1 function do?", they probably will answer fnc1 returns an empty list {}. Because that's natural to programmers. = puts the result of its right side to its left side. {} is on the righ side, so it must have a value.

But this known behavior doesn't work in Cpp2. = doesn't set anything in the example, and {} doesn't have a value too! Of course void is nothing, it's not even similar to null and nullptr which they are values.

EDIT: Considering this note from this comment:

I mean = {...} doesn't work in Cpp2 generally:
x: TYPE = 10;
x = { /* ... */ } // Syntax error
Because it's meaningless to put a block statement in a side of assignment.

Looking at fnc2, a typical programmer which is not familiar with Cpp2, thinks something() has a value and the function returns the value from the right side of =, that's expected because of the assignment operator in the declaration. Although we had this natural behavior in Cpp2 before, but @hsutter changed this behavior (because of issue #257) because of the push-back he got on issues/comments as he explained it here.

Why not bring back the original behavior for function declarations with = expression only?

Additional context

To solve the issue, Cpp2 may have a simple defined rule to separate statements from expressions in declarations, if one extra = could be dropped from them:

fnc1: () { print(10); } // A block statement
fnc2: () = 10; // An expression

fnc1 is defined by a block statement, because it doesn't have =, but fnc2 is defined by an expression, because it has =. This rule allows to categorize function decleration to two syntaxes.

First, the function declaration (also lambdas) with block statement: it has not a return type by default:

 fnc1: () -> void { print(0); }
 fnc1: () { print(0); }
 fnc1: () print(0);

Second, the function declaration (also lambdas) with an expression after =: it has a generic return type by default:

 fnc1: () -> _ = 10;
 fnc1: () = 10;

In a similar manner, types and namespaces would be (more familiar to C++ programmers):

cls1: type { ... }
cls2: type = cls1;

nsp1: namespace { ... }
nsp2: namespace = nsp1;

In general {} is a block statement and = ... is an expression or a name (or a fully qualified name).

Also for constexpr functions and variables, other solutions such as using a keyword can be considered.

Discussions #742 and #714 are related to this topic. Thanks.

Follow-up Readings

More discussion about how the new syntax may improve Cpp2 in this comment:

Consistency with Control Structurs
Consistency of Variable and Function Declarations (and inspect Expressions)
Avoid Lambda Syntax Ambiguity

"Why is it a bug?" from this comment:

Why has it to be fixed?
What if we drop = from them?

"Why the new syntax is necessary?" from this comment.

msadeqhe commented 11 months ago

The unifying functions and blocks as explained by @hsutter in here, will be changed like this:

f: (x: int = init) { ... }    // `x` is a parameter to the function.
f: (x: int = init) statement; // same, { } is implicit.
f: (x: int = init) = expr;    // { return expr; } is implicit.

 : (x: int = init) { ... }    // `x` is a parameter to the lambda.
 : (x: int = init) statement; // same, { } is implicit.
 : (x: int = init) = expr;    // { return expr; } is implicit.

   (x: int = init) { ... }    // `x` is a "let parameter" to the block.
   (x: int = init) statement; // same, { } is implicit.

                   { ... }    // `x` is a "let parameter" to the block.
                   statement; // same, { } is implicit.

As we see, terser lambda syntax : (x) x + 1 breaks the rules, because x + 1 is not a statement, it's an expression. By making = to be required in lambda syntax, : (x) = x + 1 may seems to be a better choice, because = improves its readability and we can specify the return type if necessary.

EDIT: For example, we can have the following lambdas:

: (x) -> void { print(x); }
: (x) { print(x); }
: (x) print(x)

: (x) -> _ { return x + 1; }

: (x) -> _ = x + 1
: (x) = x + 1

a: (x) = x + 1; is similar to declaring a variable with b: = n + 1;. It improves the consistency of declarations.

JohelEGP commented 11 months ago

Other related discussions: #634, #761.

As we see, terser lambda syntax : (x) x + 1 breaks the rules, because x + 1 is not a statement, it's an expression.

Up to date: https://github.com/hsutter/cppfront/wiki/Design-note%3A-Defaults-are-one-way-to-say-the-same-thing#from-named-functions-to-lambdas-to-parameterized-blocksstatements-to-ordinary-blocksstatements:

f:(x: int = init) = { ... }     // x is a parameter to the function
f:(x: int = init) = statement;  // same, { } is implicit

 :(x: int = init) = { ... }     // x is a parameter to the lambda
 :(x: int = init) = statement;  // same, { } is implicit

  (x: int = init)   { ... }     // x is a parameter to the block
  (x: int = init)   statement;  // same, { } is implicit

                    { ... }     // an ordinary block
                    statement;  // same, { } is implicit

I think :(x) x + 1 is actually a fine, but separate case from explaining blocks. That's why the design note was created for, after all (#714). (In another section further down:)

To me, allowing a generic function f:(i:_) -> _ = { return i+1; } to be spelled f:(i) i+1; is like that...

realgdman commented 11 months ago

It is statement (in layman word, it states a fact) that left side equals to right side. I don't think it's a problem, = has meaning like a a <- b i.e. a was something, then it changes value to b. That has sense in runtime context, when variable, well, can vary mutable. It doesn't have much sense at compile time context, because we usually cannot modify or remove things from program, only add facts about it, but after compile, all those facts are set in stone and cannot be changed, so = is moot. That's all humble IMO of course.

JohelEGP commented 11 months ago

It doesn't have much sense at compile time context, because we usually cannot modify or remove things from program, only add facts about it, but after compile, all those facts are set in stone and cannot be changed, so = is moot.

The same is true of const and constexpr variables. Would you say those use = for consistency?

msadeqhe commented 11 months ago

@realgdman The a <- b relation doesn't work well with = {}. For example:

fnc1: () -> i32 = x = { return y; }  // ERROR
fnc2: () -> i32 = { return x; } = y; // ERROR

fnc3: () -> i32 = x = y; // OK

Generally = {} doesn't work consistently in other parts of the language. {} cannot be set to anything, also we cannot set it to anything, because {} is not an expression, it`s a block statement:

x: = something;
x = { /* ... */ }; // ERROR
x = another;       // OK

On the other hand, = (as most programmers expect it that way) works with expressions.

realgdman commented 11 months ago

Does that works because = assignment returns/has a value? It is natural for C/C++, but there are languages where assignment doesn't return value. Also returning value from = doesn't have much sense at global compile time level? Or do you mean need of chaining v32: type == myT : type == std::vector

msadeqhe commented 11 months ago

No, I mean = {...} doesn't work in Cpp2 generally:

x: TYPE = 10;
x = { /* ... */ } // Syntax error

Because it's meaningless to put a block statement in a side of assignment.

msadeqhe commented 11 months ago

More Discussion

Let's discuss the new syntax if we drop = from declarations.

Consistency with Control Structurs

The new syntax improves consistency with control structures.

For example, this is how they look in old syntax:

if x < 10 {
    return x;
}

f: () -> i32 = {
    return 0;
}

Now, comparing them in new syntax:

if x < 10 {
    return x;
}

f: () -> i32 {
    return 0;
}

Both of {} don't have = because they are block statements.

Consistency of Variable and Function Declarations (and `inspect` Expressions)

For example, this is how they look in old syntax:

var1: = n + 1;
var2: i32 = n + 1;
fnc1: (x) x + 1; // Inconsistent
fnc2: (x) -> i32 = x + 1;

Syntactically fnc1 is not consistent with other declarations.

Now, comparing them in new syntax:

var1: = n + 1;
var2: i32 = n + 1;
fnc1: (x) = x + 1;
fnc2: (x) -> i32 = x + 1;

All of them have = because they are defined with an expression. This is also true within inspect expressions:

var1: = inspect x < 10 -> bool { is true = x; is _ = x - 10; };

= ...; is used, it's consistent with how we use it to declare variables and functions from expressions. Also {} in inspect is a block expression, thus it's within expression in a side of assignment.

Avoid Lambda Syntax Ambiguity

Let's explore other concepts of {} in programming.

Block Expression

{} can be a block expression, but we don't have it yet in Cpp2. Considering if-expression:

var1: = if x < 10 { x } else { x - 10 };

Those {}'s are not block statements, they are block expressions. So it's meaningful to put them within expression in a side of assignment.

List of Items

Also {} can be a list of items:

var1: std::vector<i32> = {1, 2, 3};

That {} is not a block statement, it's an expression. Its value is a list of items.

Conclusion (Old Syntax)

Considering those two concepts and function declaration syntax, if Cpp2 will support them, we will have:

// `{}` is a block statement.
fnc1: () -> std::vector<i32> = { /*...*/ }

// `{}` is a block expression or a list of items.
fnc2: () -> std::vector<i32> = { /*...*/ };

The meaning of {} is confusing. It's meaning can be changed with a single ; at the end (that doesn't feel right). Now, let's consider the lambda syntax:

// `{}` is a block statement.
call(: () -> std::vector<i32> = { /*...*/ });

// `{}` is a block expression or a list of items.
call(: () -> std::vector<i32> = { /*...*/ });

Lambda syntax doesn't have ; at the end. So both of them looks the same now. Cpp2 has to look inside {} to find what it means. It's ambiguous for empty {}, and it harms both readability and toolability.

Conclusion (New Syntax)

If @hsutter would approve this bug report with the new syntax, fnc1 declaration would be invalid. On the other hand we would have fnc3 declaration:

// `{}` is a block expression or a list of items.
fnc2: () -> std::vector<i32> = { /*...*/ };

// `{}` is a block statement.
fnc3: () -> std::vector<i32> { /*...*/ }

Obviousely {} is a block expression or a list of items in fnc2, because it's on the right side of assignment operator, and {} is a block statement in fnc3. The lambda syntax of them is like this:

// `{}` is a block expression or a list of items.
call(: () -> std::vector<i32> = { /*...*/ });

// `{}` is a block statement.
call(: () -> std::vector<i32> { /*...*/ });

Fortunately they are different, the tool can easily know what {} means in both of them as its readability is improved.

TL;DR

The point is, the consistency between declarations and control structurs will be improved with the new syntax, and C++ programmers are familiar with it (because of strict separation of block statements and expressions). Also it avoids ambiguity if Cpp2 will support block expressions and list of items with {}. Thanks.

JohelEGP commented 11 months ago

Today, name: signature = statement is used for all declarations. You want to drop the = when the statement is a block. All to increase consistency with statements that end in a block.

That makes declarations themselves inconsistent. Also, what happens to declarations that don't always have a block?

f: () -> _ { return g(); }   // Proposed.
f: () -> _ = { return g(); } // Today.
f: () -> _ = g();            // Today.
f: () g();                   // Today.

Although you haven't suggested these, I'll mention them. I'm against dropping = for variables. I'm against using {} for initialization (e.g., from x := 0; to x: {0}; or the further inconsistent x{0};).

msadeqhe commented 11 months ago

Today, name: signature = statement is used for all declarations.

Today we have the following declarations:

name: signature = block-statement
name: signature = statement
name: signature = expression;
name: parameter-list statement
name: parameter-list expression;

Am I right? So we don't have only one declaration syntax today. I've explained why they are confusing.

That makes declarations themselves inconsistent.

My suggested syntax is:

name: signature block-statement
name: signature = expression;
name: parameter-list statement

Why do you find it inconsistent? It improves Cpp2 code, because it makes block-statement and statement not to be confused with expression. It makes assignment in = expression; a known notation to be used everywhere. Hence = expression; is the only way to declare variables.

JohelEGP commented 11 months ago

Today, name: signature = statement is used for all declarations.

Today we have the following declarations:
name: signature = block-statement
name: signature = statement
name: signature = expression;
name: parameter-list statement
name: parameter-list expression;
Am I right? So we don't have only one declaration syntax today. I've explained why they are confusing.

The first tree are covered by the second one. The last two are https://github.com/hsutter/cppfront/wiki/Design-note%3A-Defaults-are-one-way-to-say-the-same-thing. Are you also refuting defaults when = is involved?

That makes declarations themselves inconsistent.

My suggested syntax is:
name: signature block-statement
name: signature = expression;
name: parameter-list statement
Why do you find it inconsistent?

Not all declarations follow the same syntax name: signature = statement. Also, it's not clear how that supports parameterized expressions like :(x) x.

msadeqhe commented 11 months ago

Are you also refuting defaults when = is involved?

No, they can have defaults:

^{From this comment:} EDIT: For example, we can have the following lambdas:
: (x) -> void { print(x); }
: (x) { print(x); }
: (x) print(x)

: (x) -> _ { return x + 1; }

: (x) -> _ = x + 1
: (x) = x + 1
a: (x) = x + 1; is similar to declaring a variable with b: = n + 1;. It improves the consistency of declarations.

= is consistently for expressions only.

Not all declarations follow the same syntax name: signature = statement. Also, it's not clear how that supports parameterized expressions like :(x) x.

Yes, it doesn't treat statements and expressions in the same way. On the other hand, current Cpp2 tries to treat them in the same way. That's why I'm reporting this bug.

AbhinavK00 commented 11 months ago

I simply think that pursuing this issue is not worth it. Personally, while I was initially of the same opinion, cpp2 syntax has grown on me. And I think there's some language that uses == for constants and the way cpp2 uses it kinda makes sense (except for constexpr functions).

Edit: About lists, zig uses .{} syntax for braced initialisers, cpp2 could have the same. Another (better) way could be to have a trailing or leading comma to signify lists/tuples (pls have those in cpp2).

msadeqhe commented 11 months ago

Why is it a bug?

This bug report is about = {} in declarations. I ask to fix it by dropping = from them, so it leads to a new syntax.

New Syntax

The ways we declare functions in new syntax are:

// With new syntax
name: signature statement
name: signature = expression;

That's it. statement may be either a block statement ({ print(x); }) or a single statement (print(x);).

Infact the second one is a variant of the first one. I mean = expression; is a syntax sugar to { return expression; }, so the new syntax can be just one grammar (with different default return types) instead of two grammar.

Old Syntax

Considering this with the current syntax = {}:

// Today in current Cpp2.
name: signature = something
name: parameter-list something

Just like the new syntax, the second is a variant of the first declaration in current Cpp2 (today).

That's it. Today (in current Cpp2) something is confusing to both programmers and tools, because we cannot simply find if something is a statement or an expression.

Why has it to be fixed?

In the current Cpp2's grammar:

// Today in current Cpp2.
name: signature = something
name: parameter-list something

The something part can be either a block statement, a single statement or an expression. So why is something confusing? Let's consider the grammar of current Cpp2 (today) if we don't specify the return type of functions:

// Today in current Cpp2.
name: parameter-list = something
name: parameter-list something

parameter-list means we didn't have specified the return type in the signature. So the first something doesn't return a value, but the second something returns a value! It is inconsistent and confusing:

a: (x) = fnc1(); // It doesn't set/return a value!
b: (x) fnc2();     // But it sets/returns a value!

Please let's review it again, fnc1() doesn't return a value, but fnc2() returns a value! What?! = in the first declaration has exactly the opposite behavior of assignment. That falsely means when the return type is not specified, the programmer have to read it like the following sentence to find out the default return type:

"= is written before fnc1(), so it doesn't have a value!".

It's absolutely inconsistent, and unfamiliar to C++ programmers.

On the other hand, if fnc1() has a return value, we have to write it like this one instead, otherwise it'll be a compiler error!

a: (x) = _ = fnc1(); // It doesn't set/return a value!
b: (x) fnc2();         // But it sets/returns a value!

How much do you find it readable/consistent?

It's the main reason behind of this bug report.

Also it's misleading, and also inconsistent with how we currently declare variables:

fnc1: (x) = fnc1(); // It doesn't set/return a value!
var1: i32 = fnc2();   // But it sets/returns a value!

Also = something without signature doesn't have a meaning in itself. They highly depend on the signature of the function, tools have to check the signature, and programmers have to read the signature. Otherwise no one will know if something is a statement or an expression.

Although it has a higher level view in the language, because the programmer writes code regardless of if it's a statement or an epxression, but it complicates the grammar of Cpp2 for parsing, tooling and reading the code.

That's the reason I've created this issue as a bug report instead of a feature suggestion.

What if we drop `=` from them?

On the other hand, if we drop = from = something, the grammer of Cpp2 will have a distinct syntax for each block statement and expression. I mean:

= expression; will be an expression everywhere. It will have a value everywhere.
{} will be a block statement everywhere if = is not present before it. {} will be without = in both declarations and control flows consistently.
Because of the distinct syntax and the well defined separation of statement and expression, the rules will be specified simpler than when we don't have that distinguish syntax. Tools and programmers can visually find the meaning of the code with the help of this distinguish syntax. This also allows to consistently add more features to the language:
- Discussion #836.

Thanks for your patience.

msadeqhe commented 11 months ago

Why the new syntax is necessary?

To fix this bug:

a: (x) = fnc1(); // It doesn't set/return a value!
b: (x) fnc2();     // But it sets/returns a value!
Please let's review it again, fnc1() doesn't return a value, but fnc2() returns a value! What?! = in the first declaration has exactly the opposite behavior of assignment. That falsely means when the return type is not specified, the programmer have to read it like the following sentence to find out the default return type:

"= is written before fnc1(), so it doesn't have a value!".

It's absolutely inconsistent, and unfamiliar to C++ programmers.

We may consider the following solutions to fix the bug of the default return type.

Solution 1

Add = to all declarations:

a: (x) = fnc1(); // ERROR! It doesn't set/return a value!
b: (x) = fnc2();          // But it sets/returns a value!

In this way we have to make _ to be the default return type for all declarations. So a declaration is a compiler error! and we have to explicitly write it like:

a: (x) -> void = fnc1(); // It doesn't set/return a value!
b: (x) = fnc2();           // But it sets/returns a value!

The problem is that for functions with block statements, the default return type will be deduced too:

^{@hsutter's comment:} Quick ack: Making -> _ deduced return types the default for namespace- and type-scope functions (as it is today for auto-declared trailing return syntax functions) feels problematic because it requires callers to be aware of the implementation details. And that's fine and useful, but it's more work at compile time and occasionally more brittle to changes in the function body, so it should be opt-in I think. The reason I made it the default for local anonymous functions (lambdas) at first was because they're inherently local and inline anyway so don't really hit the composable-at-scale issues.

This solution doesn't allow Cpp2 to have different default return types in terser function syntax when it's needed in different context. So we always have to write = regardless of if the function is declared with a block statement or an expression.

Solution 2

Swap = in declarations:

a: (x) fnc1(); // It doesn't set/return a value.
b: (x) = fnc2(); // But it sets/returns a value.

It looks natural. The default return type is natural for both of them. The programmer simply looks at = and says:

Hey, it has = before func2(), so it must have a value.

That's natural in other part's of the language. It improves readability. It makes = an important part of function declarations. It's consistent with how we declare variables, or assign values to them:

b: (x) = fnc2();

x: i32 = 2;
x = 10;

This change leads to drop = from full function declaration syntax too:

a: (x) -> void { fnc1(); }
a: (x) { fnc1(); }
a: (x) fnc1();

b: (x) -> _ { return fnc2(); }
b: (x) -> _ = fnc2();
b: (x) = fnc2();

Also for consistency, other declarations have to be changed:

f: (xy) { }
t: type { }
n: namesapce { }

g: (xy) = value;
v: TYPE = value;

It's consistent with control flows, inspect expressions, and etc.

Conclusion

To fix this issue, solution 2 is superior to solution 1. Thanks.

msadeqhe commented 11 months ago

The number of up votes on this comment, is a data to consider that many Cpp2 programmers find f: (x) = x; to be more readable and understandable than f: (x) x;:

Maybe it is just me, but I think I am loving the = simply because of readability.

For the same reason I don't like the terse function syntax in the form func: (x) x;

Furthermore, I am concerned about the == syntax for constexpr, because I am immediately thinking of an equality comparison operator and that's simply wrong.

...

That's because programmers expect == and especially = to have a familiar and consistent behavior.

msadeqhe commented 11 months ago

Now considering type and namespace declarations, {} defines a new one (from a block declaration), = sets its definition from a previousely defined one (from a name or fully qualified name):

// {} defines a new type:
cls1: type {
    // declarations...
}

// cls2 definition is set from cls1 definition:
cls2: type = cls1;

It's similar to using statement (type and namespace aliases) in Cpp1.

Considering lambda expressions (unnamed functions), it will be terser without = in full lambda syntax:

: (x) -> void { print(x); }
: (x) { print(x); }

While it's somehow similar to lambdas in Cpp1, but currently we have to write them in Cpp2 (today) with =:

: (x) -> void = { print(x); }
: (x) = { print(x); }

Discussion #793 is related to this topic.

hsutter / cppfront

[BUG] `==` and `=` in declarations are confusing. #824

Description

To Reproduce

Notation ==

Notation =

Additional context

Follow-up Readings

More Discussion

Consistency with Control Structurs

Consistency of Variable and Function Declarations (and inspect Expressions)

Avoid Lambda Syntax Ambiguity

Block Expression

List of Items

Conclusion (Old Syntax)

Conclusion (New Syntax)

TL;DR

Why is it a bug?

New Syntax

Old Syntax

Why has it to be fixed?

What if we drop = from them?

Why the new syntax is necessary?

Solution 1

Solution 2

Conclusion

Notation `==`

Notation `=`

Consistency of Variable and Function Declarations (and `inspect` Expressions)

What if we drop `=` from them?