Open PavelVozenilek opened 8 years ago
Interesting idea. This is on the table since it adheres to the principle that being readable is more important than being writable.
Arguments against it are that it would be strange and cumbersome for people coming from a C background. People are used to 3 * 4 + 1
meaning that the multiplication comes first.
:+1:
Some precedence is necessary, such as assignment, function call invocation, array index. Too much parens and the language turns into lisp.
Assignment, function call, array index are not infix operators. Could there be any ambiguity with them?
I can imagine having default precedence for basic arithmetic (+-/), this is what little kids get drilled in schools, but IMO it is not worth the effort. One never writes `3 4 + 1in the code, one does
some_variable * another_variable + something_completely_else`, and all advantages of short infix syntax get lost there anyway. Brackets would help to separate it into visual chunks.
Also, this feature would make adding user defined operators into the language easier.
I think this would be a cool idea for some categories of operators, specifically the ones i can never remember the precedence for. e.g. a && b || c
would be a compile error, or a & b | c
. i don't think we need to go so far as to outlaw a + b * c
though.
see also #29
My favorite quote from the Pony page is
Most people will remember to do multiplication before addition, but what about left bit shifting versus bitwise and? Sometimes people misremember (or guess wrong) and that leads to bugs. Worse, those bugs are often very hard to spot.
I love the idea for specific parens. It makes the execution crystal clear.
The current operator precedence table is inspired by C. But I think Zig can do better than that. For example, &
> ^
> |
seems pretty silly. Even if you think &
> |
makes sense (because of the analogy to *
and +
), it's still pretty confusing that ^
is in the middle of them.
I wanted to write up a proposal here to outline a new precedence table for operators in Zig, but Zig's grammar is a bit more complicated than the current table in doc/langref.md
shows, and it's not as simple as rearranging and combining some rows.
At the very least, I'd like to see required parentheses for: &
, ^
, |
, <<
, >>
intermixed with each other, and for and
and or
intermixed with each other. There's more to the proposal, but I need to do a more careful analysis of Zig's grammar before posting something comprehensive.
Here's the real story on Zig's expression grammar (this does not include grammars such as statements, case declarations, top level declarations, etc.). I'll write a follow up comment documenting what i think the expression grammar should be changed to.
There are three types of operator:
x
and end with y
, where x
and y
are some expressionsx
, where x
is some expressionx
, where x
is some expressionA chainable operator is one that you can use like a + b + c
, !!x
, or a(b)(c)
, and has a well-defined associativity. All infix chainable operators are left-to-right associative (e.g. a - b - c == (a - b) - c
). Chainable prefix and suffix operators have an obvious associativity (right-to-left and left-to-right respectively). A not-chainable operator cannot be used repeatedly, e.g. inline inline a
, a{}{}
, and a = b = c
are all not allowed.
The following is a list of operator groups. The groups are listed here in order of highest-to-lowest precedence. Each item in a group has equal precedence. If a group is chainable, the operators in the group can be intermixed while chaining (e.g. a.b[c]
).
primary expressions:
123
- and variations, where 123
is a number literal"..."
- where ...
is the characters of the string literal'...'
- where ...
is the characters of the char literaltrue
, false
, null
, continue
, undefined
, error
, this
, unreachable
id
- where id
is an identifier@id(...)
- where id
is an identifier and ...
is a parameter listgoto label
- where label
is a labelbreak
, return
- without an expression(x)
{...}
- where ...
is a statement listasm ("...")
- and variations, where ...
is inline assemblystruct { ... }
- and variations, where ...
is a container member listif (a) {...}
- and variations, where ...
is a statement list[1]if (a) b else {...}
- and variations, where ...
is a statement list[1]while (a) {...}
- and variations, where ...
is a statement list[1]while (a) b else {...}
- and variations, where ...
is a statement list[1]for (a) {...}
- and variations, where ...
is a statement list[1]for (a) b else {...}
- and variations, where ...
is a statement listswitch (a) {...}
- where ...
is a case listcomptime {...}
- where ...
is a statement list[1]prefix, not chainable:
inline x
suffix, chainable:
x(...)
- where ...
is a parameter listx[a]
- and variationsx.id
- where id
is an identifierprefix, chainable:
!x
-x
, -%x
~x
*x
&x
- and variations??x
%%x
?x
%x
[]x
- and variationsextern fn() -> x
- and variationssuffix, not chainable:
x{...}
- where ...
is a container init bodyinfix, chainable:
x * y
, x *% y
x / y
x % y
x ** y
infix, chainable:
x + y
, x +% y
x - y
, x -% y
x ++ y
infix, chainable:
x << y
, x <<% y
x >> y
infix, chainable:
x & y
infix, chainable:
x ^ y
infix, chainable:
x | y
infix, not chainable:
x == y
- and variationsx < y
- and variationsinfix, chainable:
x and y
infix, chainable:
x or y
infix, chainable:
x ?? y
x %% y
- and variationsinfix, not chainable:
x = y
- and variationsprefix, chainable:
return x
break x
if (a) x
- and variations[1]if (a) b else x
- and variations[1]while (a) x
- and variations[1]while (a) b else x
- and variations[1]for (a) x
- and variations[1]for (a) b else x
- and variations[1]comptime x
[1][1]: for the statement-like expressions if
, for
, while
, and comptime
, a special exception is made that if a body expression starts with a block (starts with a {
), then the body expression ends at the end of that block (at the }
). This disallows, for example, if (a) {b} + 1 else {c}
.
For what it's worth I just removed the inline
prefix operator. (See #306)
Here's my proposal for the "operator precedence table". It's a lot more complicated than a table.
Before people automatically reject this grammar for being overly complicated, the goal here is to make Zig source code less complicated. My goal for grouping everything the way I did was so that any time a user would ask "Which of these two operators has precedence over the other.", the answer is that they're incompatible, and you need to use parentheses.
Now clearly it's difficult to predict when users will ask that, and so most of the proposal is what I think is a good idea to me, and it's wildly up for debate and discussion. For example, why did I make modulo not heterogeneously chainable with multiplication? Because it seemed like you wouldn't really need that, and it would be more clear with parentheses I guess. It's pretty subjective, so debate on all points of the proposal is welcome and requested, even if you're just giving your opinion.
What follows is a proposal (that has a lot in common with the existing situation), then a summary of changes from the existing grammar, and then some examples.
There are three types of operator:
x
and end with y
.x
.x
.The tree below has ordered lists and unordered lists. The ordered lists specify a precedence relation in order of highest to lowest (1 is the highest precedence.). An unordered list specifies a group of items that cannot be intermixed with each other, unless the header for the list specifies otherwise.
Chainability comes in three forms:
a == b == c
.a + b + c
is implicitly equivalent to (a + b) + c
. When a group (specified with an unordered list) of 2 or more items is chainiable, it will be one of these two subtypes:
a + b - c
.a & b & c
is allowed, but a & b | c
is not.123
- and variations, where 123
is a number literal"..."
- where ...
is the characters of the string literal'...'
- where ...
is the characters of the char literaltrue
, false
, null
, continue
, undefined
, error
, this
, unreachable
id
- where id
is an identifier@id(...)
- where id
is an identifier and ...
is a parameter listgoto label
- where label
is a labelbreak
, return
- without an expression(x)
{...}
- where ...
is a statement listasm ("...")
- and variations, where ...
is inline assemblystruct { ... }
- and variations, where ...
is a container member listif (a) {...}
- and variations, where ...
is a statement list[1]if (a) b else {...}
- and variations, where ...
is a statement list[1]while (a) {...}
- and variations, where ...
is a statement list[1]while (a) b else {...}
- and variations, where ...
is a statement list[1]for (a) {...}
- and variations, where ...
is a statement list[1]for (a) b else {...}
- and variations, where ...
is a statement listswitch (a) {...}
- where ...
is a case listcomptime {...}
- where ...
is a statement list[1]x(...)
- where ...
is a parameter listx[a]
- and variationsx.id
- where id
is an identifier!x
-x
, -%x
~x
*x
&x
- and variations??x
%%x
?x
%x
[]x
- and variationsextern fn() -> x
- and variations x{...}
- where ...
is a container init bodyx * y
x *% y
x / y
x % y
x ** y
x + y
x +% y
x - y
x -% y
x ++ y
x << y
x <<% y
x >> y
x & y
x ^ y
x | y
x ?? y
x %% y
- and variationsx == y
x != y
x < y
x <= y
x > y
x >= y
x and y
x or y
return x
break x
if (a) x
- and variations[1]if (a) b else x
- and variations[1]while (a) x
- and variations[1]while (a) b else x
- and variations[1]for (a) x
- and variations[1]for (a) b else x
- and variations[1]comptime x
[1][1]: for the statement-like expressions if
, for
, while
, and comptime
, a special exception is made that if a body expression starts with a block (starts with a {
), then the body expression ends at the end of that block (at the }
). This disallows, for example, if (a) {b} + 1 else {c}
.
++
cannot be mixed with +
and the like.**
cannot be mixed with *
and the like, and cannot be chained.%
cannot be mixed with other multiplicative operators, and is not chainable.a << b + c
.??
and %%
infix operators cannot be intermixed.and
and or
cannot be intermixed.inline
is gone; see #306.return a and b == c + d * e;
x << n + 1
. fixed: x << (n + 1)
.x << 1 << n
. fixed: x << (1 << n)
(or whatever)." " ** spaces_per_indent ** indent_level
. fixed: " " ** (spaces_per_indent * indent_level)
.foo() ?? bar() %% 0
. fixed: foo() ?? (bar() %% 0)
.Here's some code that already follows this proposal from std/rand.zig
:
prev_value = int(i) +% f *% (prev_value ^ (prev_value >> (int.bit_count - 2)));
mt.array[i] = mt.array[i + m - n] ^ (x >> 1) ^ mag01[x & 0x1];
Here's some code from std/base64.zig
that's fine:
dest[out_index] = alphabet[((source[i] & 0x3) <<% 4) | ((source[i + 1] & 0xf0) >> 4)];
But this code from std/base64.zig
needs parentheses:
dest[dest_index] = ascii6[source[src_index + 2]] <<% 6 | ascii6[source[src_index + 3]];
Here's a curious case from std/rand.zig
, where there are too many parentheses:
return start + (rand_val % range);
Perhaps that last case is a clue that we should not consider %
to be a multiplicative math operator, but rather be a value manipulation operator like &
that doesn't mix with any other value manipulation operator.
I want ??
to be chainable because of this case:
foo() ?? bar() ?? 0
. equivalent to foo() ?? (bar() ?? 0)
and (foo() ?? bar()) ?? 0
.But now that I think about that some more, it might not actually be correct. Consider if foo()
returns a ??u32
. This means (foo() ?? bar()) ?? 0
would unwrap both maybes, but foo() ?? (bar() ?? 0)
would not. Hm. So either we need to define ??
as right-to-left associative, or make it not chainable. Similarly with %%
.
I agree with this proposal. Regarding the question at the end, let's just make it not chainable.
Regarding %
, I think it's related enough to /
that it should have the same chain properties as /
.
can you give a usecase for chaining a % b % c
? it seems like you'd only ever want at most one %
in a chained expression.
I meant that you would be able to do return start + rand_val % range
I'll take a look at this before 0.2.0. If anyone wants to see this in 0.1.0 feel free to make a pull request.
I think requiring more parenthesis is wise. Even though I know the precedence in C pretty well, I add parenthesis to make the code easier to read. The proposals here all look pretty good.
I think precedence should be kept for "algebras" though (within them, not between them). That is, a * b + c
and a and b or c
and a & b | c
. If it's kept for algebra on numbers, but not on booleans, then it's more or less arbitrary and it'd probably confuse as many or more people than it would help.
I know that the precedence of logical/bitwise boolean operators isn't high school knowlegde. But I think it should be considered basic programming knowledge that or/and is similar to add/multiply. I mean, if we can't assume a tiny bit of knowledge of boolan algebra from a programmer, especially the kind that is likely to use Zig, what can we assume?
I made a proposal about the same issue here https://github.com/crystal-lang/crystal/issues/5074
based on this blog post https://foonathan.net/blog/2017/07/24/operator-precedence.html
with examples https://github.com/crystal-lang/crystal/issues/5074#issuecomment-352428851
It was generally well accepted but there was no one with sufficient expertise to make it happen
I would actually argue (see #1287) that bitshift operations should have higher precedence so that a >> b + c == (a >> b) + c
and also obviously allowing b & 1 << 4 == b & (1 << 4)
both are useful when working with bits.
Too long won’t read what is going on?
see the "Examples" in my long comment for proposed new errors.
@thejoshwolfe Thanks, I will look at that.
Having zig fmt
automatically fix such expressions with the correct paren nesting zig sees would be incredibly helpful as then there's less of a need to place the parens manually and eliminates the possible learning curve to which expressions need them where. The writing case wouldn't be affected as much apart from having to remove extra parens which almost any editor handles with ease.
I wouldn't mind contributing it once I've gotten my head around the current formatting code and zig in general.
Based on
I reckon there are 3 steps to this :
I think removing infix precedence would make things unambiguous, but also harder to read.
By that I mean: the order in which things happen in would be clear, but would also slow down reading considerably. The human mind prefers infix, because infix is how natural languages (English, French, ...) are structured.
To illustrate:
infix: 2 + 3
boy take book
prefix: + 2 3
take boy book
postfix: 2 3 +
boy book take
A more complicated example:
infix: 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9
prefix: + + + + + + + 2 3 4 5 6 7 8 9
postfix: 2 3 4 5 6 7 8 9 + + + + + + +
With the prefix and postfix notation, I can't even visually make sure the number of operators is correct. Not only is it hard to write, but also hard to read.
However, I do agree that the entire precedence table is difficult to remember and I find myself either looking it up or adding extra parentheses anyway to make things clear.
Thus, one radical way to solve this might be to entirely get rid of operator precedence for all binary operators (operators that take in two operands).
For example, the following two lines would be equivalent:
a * b + c / d % e % f + g << h & i | j and k or f
(((((((((((a * b) + c) / d) % e) % f) + g) << h) & i) | j) and k) or f)
Simply read from left to right. It's still somewhat awkward because of math classes hammering in the idea that multiplication happens before addition, but I think it's a reasonable compromise between readability and unambiguity. At the very least, I prefer this over the prefix / postfix notation.
However, getting rid of the entire precedence table and going left to right would lead to the following:
// consider this:
x += 3 * 5
// it converts to
(x += 3) * 5
// to get original meaning we would need to do this
x += (3 * 5)
It's somewhat arguable if this really helps or not. Imagine reading through a 3-d math function full of this:
x += ( x - (y * z));
y += ( a + (b / 8));
z -= ( c | (d & e));
I feel like this might be more readable, but I can see somebody disliking it.
TLDR:
A perhaps minor thing, but I feel like division should need parenthesis.
20 / 2 / 2, is it 1 or 5? 20 / 2 * 2, is it 20 or 5?
Maybe I should be 100% certain after programming for 20 years but... I'm not.
I think it's fair enough to assume basic mathematical knowledge, with division and multiplication first, left-to-right, and then addition and subtraction, also left-to-right. So 20 / 2 / 2
is the same as (20 / 2) / 2
, and 20 / 2 * 2
is (20 / 2) * 2
Look at the cryptic table:
Why it is all needed? Who will find the time to learn it? I had not internalized C precedence after two decades.
Language Pony does it differently: