require parentheses sometimes to disambiguate confusing operator precedence

PavelVozenilek commented 8 years ago

Look at the cryptic table:

x() x[] x.y
!x -x ~x *x &x ?x %x %%x
x{}
* / %
+ - ++
<< >>
&
^
|
== != < > <= >=
&&
||
?? %%
= *= /= %= += -= <<= >>= &= ^= |= &&= ||=

Why it is all needed? Who will find the time to learn it? I had not internalized C precedence after two decades.

Language Pony does it differently:

Pony takes a different approach and outlaws infix precedence. Any expression where more than one infix operator is used must use parentheses to remove the ambiguity. If you fail to do this the compiler will complain.

andrewrk commented 8 years ago

Interesting idea. This is on the table since it adheres to the principle that being readable is more important than being writable.

Arguments against it are that it would be strange and cumbersome for people coming from a C background. People are used to 3 * 4 + 1 meaning that the multiplication comes first.

emanuelpalm commented 8 years ago

:+1:

andrewrk commented 8 years ago

Some precedence is necessary, such as assignment, function call invocation, array index. Too much parens and the language turns into lisp.

PavelVozenilek commented 8 years ago

Assignment, function call, array index are not infix operators. Could there be any ambiguity with them?

I can imagine having default precedence for basic arithmetic (+-/), this is what little kids get drilled in schools, but IMO it is not worth the effort. One never writes `3 4 + 1in the code, one doessome_variable * another_variable + something_completely_else`, and all advantages of short infix syntax get lost there anyway. Brackets would help to separate it into visual chunks.

PavelVozenilek commented 8 years ago

Also, this feature would make adding user defined operators into the language easier.

thejoshwolfe commented 8 years ago

I think this would be a cool idea for some categories of operators, specifically the ones i can never remember the precedence for. e.g. a && b || c would be a compile error, or a & b | c. i don't think we need to go so far as to outlaw a + b * c though.

thejoshwolfe commented 7 years ago

see also #29

phase commented 7 years ago

My favorite quote from the Pony page is

Most people will remember to do multiplication before addition, but what about left bit shifting versus bitwise and? Sometimes people misremember (or guess wrong) and that leads to bugs. Worse, those bugs are often very hard to spot.

I love the idea for specific parens. It makes the execution crystal clear.

thejoshwolfe commented 7 years ago

The current operator precedence table is inspired by C. But I think Zig can do better than that. For example, & > ^ > | seems pretty silly. Even if you think & > | makes sense (because of the analogy to * and +), it's still pretty confusing that ^ is in the middle of them.

I wanted to write up a proposal here to outline a new precedence table for operators in Zig, but Zig's grammar is a bit more complicated than the current table in doc/langref.md shows, and it's not as simple as rearranging and combining some rows.

At the very least, I'd like to see required parentheses for: &, ^, |, <<, >> intermixed with each other, and for and and or intermixed with each other. There's more to the proposal, but I need to do a more careful analysis of Zig's grammar before posting something comprehensive.

thejoshwolfe commented 7 years ago

Here's the real story on Zig's expression grammar (this does not include grammars such as statements, case declarations, top level declarations, etc.). I'll write a follow up comment documenting what i think the expression grammar should be changed to.

There are three types of operator:

infix operators: always start with x and end with y, where x and y are some expressions
prefix operators: always end with x, where x is some expression
suffix operators: always start with x, where x is some expression
primary expressions: not really operators, because they are either a single token, or they start and end with unambiguous grouping tokens.

A chainable operator is one that you can use like a + b + c, !!x, or a(b)(c), and has a well-defined associativity. All infix chainable operators are left-to-right associative (e.g. a - b - c == (a - b) - c). Chainable prefix and suffix operators have an obvious associativity (right-to-left and left-to-right respectively). A not-chainable operator cannot be used repeatedly, e.g. inline inline a, a{}{}, and a = b = c are all not allowed.

The following is a list of operator groups. The groups are listed here in order of highest-to-lowest precedence. Each item in a group has equal precedence. If a group is chainable, the operators in the group can be intermixed while chaining (e.g. a.b[c]).

primary expressions:

123 - and variations, where 123 is a number literal
"..." - where ... is the characters of the string literal
'...' - where ... is the characters of the char literal
true, false, null, continue, undefined, error, this, unreachable
id - where id is an identifier
@id(...) - where id is an identifier and ... is a parameter list
goto label - where label is a label
break, return - without an expression
(x)
{...} - where ... is a statement list
asm ("...") - and variations, where ... is inline assembly
struct { ... } - and variations, where ... is a container member list
if (a) {...} - and variations, where ... is a statement list[1]
if (a) b else {...} - and variations, where ... is a statement list[1]
while (a) {...} - and variations, where ... is a statement list[1]
while (a) b else {...} - and variations, where ... is a statement list[1]
for (a) {...} - and variations, where ... is a statement list[1]
for (a) b else {...} - and variations, where ... is a statement list
switch (a) {...} - where ... is a case list
comptime {...} - where ... is a statement list[1]

prefix, not chainable:

inline x

suffix, chainable:

x(...) - where ... is a parameter list
x[a] - and variations
x.id - where id is an identifier

prefix, chainable:

!x
-x, -%x
~x
*x
&x - and variations
??x
%%x
?x
%x
[]x - and variations
extern fn() -> x - and variations

suffix, not chainable:

x{...} - where ... is a container init body

infix, chainable:

x * y, x *% y
x / y
x % y
x ** y

infix, chainable:

x + y, x +% y
x - y, x -% y
x ++ y

infix, chainable:

x << y, x <<% y
x >> y

infix, chainable:

x & y

infix, chainable:

x ^ y

infix, chainable:

x | y

infix, not chainable:

x == y - and variations
x < y - and variations

infix, chainable:

x and y

infix, chainable:

x or y

infix, chainable:

x ?? y
x %% y - and variations

infix, not chainable:

x = y - and variations

prefix, chainable:

return x
break x
if (a) x - and variations[1]
if (a) b else x - and variations[1]
while (a) x - and variations[1]
while (a) b else x - and variations[1]
for (a) x - and variations[1]
for (a) b else x - and variations[1]
comptime x[1]

[1]: for the statement-like expressions if, for, while, and comptime, a special exception is made that if a body expression starts with a block (starts with a {), then the body expression ends at the end of that block (at the }). This disallows, for example, if (a) {b} + 1 else {c}.

andrewrk commented 7 years ago

For what it's worth I just removed the inline prefix operator. (See #306)

thejoshwolfe commented 7 years ago

Here's my proposal for the "operator precedence table". It's a lot more complicated than a table.

Before people automatically reject this grammar for being overly complicated, the goal here is to make Zig source code less complicated. My goal for grouping everything the way I did was so that any time a user would ask "Which of these two operators has precedence over the other.", the answer is that they're incompatible, and you need to use parentheses.

Now clearly it's difficult to predict when users will ask that, and so most of the proposal is what I think is a good idea to me, and it's wildly up for debate and discussion. For example, why did I make modulo not heterogeneously chainable with multiplication? Because it seemed like you wouldn't really need that, and it would be more clear with parentheses I guess. It's pretty subjective, so debate on all points of the proposal is welcome and requested, even if you're just giving your opinion.

What follows is a proposal (that has a lot in common with the existing situation), then a summary of changes from the existing grammar, and then some examples.

Proposal

There are three types of operator:

infix operators: always start with x and end with y.
prefix operators: always end with x.
suffix operators: always start with x.
primary expressions: not really operators, because they are either a single token, or they start and end with unambiguous grouping tokens.

The tree below has ordered lists and unordered lists. The ordered lists specify a precedence relation in order of highest to lowest (1 is the highest precedence.). An unordered list specifies a group of items that cannot be intermixed with each other, unless the header for the list specifies otherwise.

Chainability comes in three forms:

not chainable. There is no implicit associativity, so you must use parentheses to combine 3 or more operands. e.g. you can't do a == b == c.
chaininable somehow. This means for infix operators associativity is implicitly left-to-right when combining 3 or more operands with the operator in between them. e.g. a + b + c is implicitly equivalent to (a + b) + c. When a group (specified with an unordered list) of 2 or more items is chainiable, it will be one of these two subtypes:
- heterogeneous chainable. This means all members of the group can be intermixed while chaining. e.g. a + b - c.
- homogeneous chainable. Each member of the group can be chained only with itself. e.g. a & b & c is allowed, but a & b | c is not.

primary expressions, unambiguous:
- 123 - and variations, where 123 is a number literal
- "..." - where ... is the characters of the string literal
- '...' - where ... is the characters of the char literal
- true, false, null, continue, undefined, error, this, unreachable
- id - where id is an identifier
- @id(...) - where id is an identifier and ... is a parameter list
- goto label - where label is a label
- break, return - without an expression
- (x)
- {...} - where ... is a statement list
- asm ("...") - and variations, where ... is inline assembly
- struct { ... } - and variations, where ... is a container member list
- if (a) {...} - and variations, where ... is a statement list[1]
- if (a) b else {...} - and variations, where ... is a statement list[1]
- while (a) {...} - and variations, where ... is a statement list[1]
- while (a) b else {...} - and variations, where ... is a statement list[1]
- for (a) {...} - and variations, where ... is a statement list[1]
- for (a) b else {...} - and variations, where ... is a statement list
- switch (a) {...} - where ... is a case list
- comptime {...} - where ... is a statement list[1]
suffix access and call operators, heterogeneous chainable:
- x(...) - where ... is a parameter list
- x[a] - and variations
- x.id - where id is an identifier
prefix high-precedent operators, heterogeneous chainable:
- !x
- -x, -%x
- ~x
- *x
- &x - and variations
- ??x
- %%x
- ?x
- %x
- []x - and variations
- extern fn() -> x - and variations
suffix container initialization operator, not chainable:
- x{...} - where ... is a container init body
infix value manipulation operators:
- math:
  1. multiplicative:
    - scalar multiplicative, heterogeneous chainable:
      - x * y
      - x *% y
      - x / y
    - modulo, not chainable:
      - x % y
    - repetition, not chainable:
      - x ** y
  2. additive:
    - scalar additive, heterogeneous chainable:
      - x + y
      - x +% y
      - x - y
      - x -% y
    - concatenation, chainable:
      - x ++ y
- bit shifting, not chainable:
  - x << y
  - x <<% y
  - x >> y
- bitwise algebra and monad coercion, homogeneous chainable:
  - x & y
  - x ^ y
  - x | y
  - x ?? y
  - x %% y - and variations
infix comparison operators, not chainable:
- x == y
- x != y
- x < y
- x <= y
- x > y
- x >= y
infix boolean algebra operators, homogeneous chainable:
- x and y
- x or y
prefix control operators, heterogeneous chainable:
- return x
- break x
- if (a) x - and variations[1]
- if (a) b else x - and variations[1]
- while (a) x - and variations[1]
- while (a) b else x - and variations[1]
- for (a) x - and variations[1]
- for (a) b else x - and variations[1]
- comptime x[1]

[1]: for the statement-like expressions if, for, while, and comptime, a special exception is made that if a body expression starts with a block (starts with a {), then the body expression ends at the end of that block (at the }). This disallows, for example, if (a) {b} + 1 else {c}.

Proposed Changes

Most of the "infix value manipulation operators" group is new.
- Bitshifting cannot be chained.
- ++ cannot be mixed with + and the like.
- ** cannot be mixed with * and the like, and cannot be chained.
- % cannot be mixed with other multiplicative operators, and is not chainable.
- None of the bit manipulation operators can be intermixed with the math operators. Not even a << b + c.
- ?? and %% infix operators cannot be intermixed.
and and or cannot be intermixed.
assignment is no longer an expression operator, but a statement construct. That's why it's not in the above tree.
inline is gone; see #306.

Examples

valid: return a and b == c + d * e;
error: x << n + 1. fixed: x << (n + 1).
error: x << 1 << n. fixed: x << (1 << n) (or whatever).
error: " " ** spaces_per_indent ** indent_level. fixed: " " ** (spaces_per_indent * indent_level).
error: foo() ?? bar() %% 0. fixed: foo() ?? (bar() %% 0).

Here's some code that already follows this proposal from std/rand.zig:

prev_value = int(i) +% f *% (prev_value ^ (prev_value >> (int.bit_count - 2)));
mt.array[i] = mt.array[i + m - n] ^ (x >> 1) ^ mag01[x & 0x1];

Here's some code from std/base64.zig that's fine:

dest[out_index] = alphabet[((source[i] & 0x3) <<% 4) | ((source[i + 1] & 0xf0) >> 4)];

But this code from std/base64.zig needs parentheses:

dest[dest_index] = ascii6[source[src_index + 2]] <<% 6 | ascii6[source[src_index + 3]];

Here's a curious case from std/rand.zig, where there are too many parentheses:

return start + (rand_val % range);

Perhaps that last case is a clue that we should not consider % to be a multiplicative math operator, but rather be a value manipulation operator like & that doesn't mix with any other value manipulation operator.

I want ?? to be chainable because of this case:

valid: foo() ?? bar() ?? 0. equivalent to foo() ?? (bar() ?? 0) and (foo() ?? bar()) ?? 0.

But now that I think about that some more, it might not actually be correct. Consider if foo() returns a ??u32. This means (foo() ?? bar()) ?? 0 would unwrap both maybes, but foo() ?? (bar() ?? 0) would not. Hm. So either we need to define ?? as right-to-left associative, or make it not chainable. Similarly with %%.

andrewrk commented 7 years ago

I agree with this proposal. Regarding the question at the end, let's just make it not chainable.

Regarding %, I think it's related enough to / that it should have the same chain properties as /.

thejoshwolfe commented 7 years ago

can you give a usecase for chaining a % b % c? it seems like you'd only ever want at most one % in a chained expression.

andrewrk commented 7 years ago

I meant that you would be able to do return start + rand_val % range

andrewrk commented 7 years ago

I'll take a look at this before 0.2.0. If anyone wants to see this in 0.1.0 feel free to make a pull request.

skyfex commented 6 years ago

I think requiring more parenthesis is wise. Even though I know the precedence in C pretty well, I add parenthesis to make the code easier to read. The proposals here all look pretty good.

I think precedence should be kept for "algebras" though (within them, not between them). That is, a * b + c and a and b or c and a & b | c. If it's kept for algebra on numbers, but not on booleans, then it's more or less arbitrary and it'd probably confuse as many or more people than it would help.

I know that the precedence of logical/bitwise boolean operators isn't high school knowlegde. But I think it should be considered basic programming knowledge that or/and is similar to add/multiply. I mean, if we can't assume a tiny bit of knowledge of boolan algebra from a programmer, especially the kind that is likely to use Zig, what can we assume?

ghost commented 6 years ago

I made a proposal about the same issue here https://github.com/crystal-lang/crystal/issues/5074

based on this blog post https://foonathan.net/blog/2017/07/24/operator-precedence.html

with examples https://github.com/crystal-lang/crystal/issues/5074#issuecomment-352428851

It was generally well accepted but there was no one with sufficient expertise to make it happen

lerno commented 6 years ago

I would actually argue (see #1287) that bitshift operations should have higher precedence so that a >> b + c == (a >> b) + c and also obviously allowing b & 1 << 4 == b & (1 << 4) both are useful when working with bits.

markfirmware commented 4 years ago

Too long won’t read what is going on?

thejoshwolfe commented 4 years ago

see the "Examples" in my long comment for proposed new errors.

markfirmware commented 4 years ago

@thejoshwolfe Thanks, I will look at that.

tauoverpi commented 4 years ago

Having zig fmt automatically fix such expressions with the correct paren nesting zig sees would be incredibly helpful as then there's less of a need to place the parens manually and eliminates the possible learning curve to which expressions need them where. The writing case wouldn't be affected as much apart from having to remove extra parens which almost any editor handles with ease.

I wouldn't mind contributing it once I've gotten my head around the current formatting code and zig in general.

Mouvedia commented 4 years ago

Based on

I agree with this proposal.

I reckon there are 3 steps to this :

introduce the changes proposed by @thejoshwolfe before 1.0.0
have an experimental branch of fmt that automatically adds parentheses for a chosen subset of the operators
evaluate whether we keep it as is or make parentheses a requirement (hence move the precedence logic to fmt)

ryancsh commented 2 years ago

I think removing infix precedence would make things unambiguous, but also harder to read.

By that I mean: the order in which things happen in would be clear, but would also slow down reading considerably. The human mind prefers infix, because infix is how natural languages (English, French, ...) are structured.

To illustrate:

infix:       2         +         3      
            boy       take      book 

prefix:      +         2         3
            take      boy       book 

postfix:     2         3         +
            boy       book      take

A more complicated example:

infix:   2 + 3 + 4 + 5 + 6 + 7 + 8 + 9

prefix:  + + + + + + + 2 3 4 5 6 7 8 9

postfix: 2 3 4 5 6 7 8 9 + + + + + + +

With the prefix and postfix notation, I can't even visually make sure the number of operators is correct. Not only is it hard to write, but also hard to read.

However, I do agree that the entire precedence table is difficult to remember and I find myself either looking it up or adding extra parentheses anyway to make things clear.

Thus, one radical way to solve this might be to entirely get rid of operator precedence for all binary operators (operators that take in two operands).

For example, the following two lines would be equivalent:

a * b + c / d % e % f + g << h & i | j and k or f

(((((((((((a * b) + c) / d) % e) % f) + g) << h) & i) | j) and k) or f)

Simply read from left to right. It's still somewhat awkward because of math classes hammering in the idea that multiplication happens before addition, but I think it's a reasonable compromise between readability and unambiguity. At the very least, I prefer this over the prefix / postfix notation.

However, getting rid of the entire precedence table and going left to right would lead to the following:

// consider this:
x += 3 * 5    

// it converts to
(x += 3) * 5

// to get original meaning we would need to do this
x += (3 * 5)

It's somewhat arguable if this really helps or not. Imagine reading through a 3-d math function full of this:

x += ( x - (y * z));
y += ( a + (b / 8));
z -= ( c | (d & e));

I feel like this might be more readable, but I can see somebody disliking it.

TLDR:

Removing infix notation is not worth it. Slows down reading too much.
As an alternative, consider flattening all binary operators precedence, and instead having the leftmost operator acting first (assuming there are no parentheses).
Simple rules over complex ones.

Srekel commented 2 years ago

A perhaps minor thing, but I feel like division should need parenthesis.

20 / 2 / 2, is it 1 or 5? 20 / 2 * 2, is it 20 or 5?

Maybe I should be 100% certain after programming for 20 years but... I'm not.

ominitay commented 2 years ago

I think it's fair enough to assume basic mathematical knowledge, with division and multiplication first, left-to-right, and then addition and subtraction, also left-to-right. So 20 / 2 / 2 is the same as (20 / 2) / 2, and 20 / 2 * 2 is (20 / 2) * 2

ziglang / zig