Let's finalize the LES3 precedence table

qwertie commented 5 years ago

It's almost time to decide a final precedence table. Some operators are obvious:

Type	Operator
infix	Primary `. fn() arr[] x++ x--`
prefix	Prefix `-x +x *x &x !x`
right-assoc	Exponentiation `a**b`
infix	MulDiv `* / %`
infix	AddSub `+ -`
infix	Comparison `== != > < >= <=`
infix	And `&&`
infix	Or `\\|\\|`
infix	Assign `=`

Others are... not so obvious as they may first appear.

First, let's briefly review a few general properties of LES operators:

The precedence of arbitrary punctuation is decided based on existing operators, using the first and last character if possible, or the last character otherwise. For example, There is no =~= operator defined, so its precedence will be the same as ==. There is no &+ operator, so its precedence will be the same as +.
An operator normally consists of a sequence of the following characters: ~ ! % ^ * - + = | < > / ? : . &
$ is an operator character but it is parsed specially. In LES2 $ is allowed only as the first character of an operator; in LES3 I'm considering a further restriction, that $ cannot combine with anything else (motivating example in #74: Console.WriteLine$$"Hello {name}!").
LES does not count \ as an operator character; it has no meaning and is reserved for future use, except that double-\ can be used to end a single-line comment before the end of the line. Quote characters ' ` " or the number sign # don't count as operator characters either (the latter is an identifier character)
The left and right side of an operator sometimes have different precedences. The standard example of this is the lambda operator =>: x == y => a == b means x == (y => (a == b)) - the precedence is high on the left and low on the right. In the system of precedence I'm using, if the left and right side have the same precedence then the operator is left-associative, and if the left side has slightly higher precedence, it makes the operator right-associative.
LES operators support immiscibility (inability to mix) which is used to generate errors for expressions that are unclear (see next section).
Except in special cases (namely & | ^ ::) LES2 and LES3 will use the same precedence table so any changes decided here will also apply to LES2.

Here are some topics to consider (see proposed precedence table at the end):

Fixing past precedence mistakes

In C, you might try to write x & 7 == 0 to check if the bottom 3 bits of x are zero; this doesn't work, of course, because it parses as x & (7 == 0) so it will actually check if the least-significant bit is one.

LES2 keeps the same precedence used in C but makes & and | and ^ immiscible with comparison operators so that x & 7 == 0 produces an error (though it also produces a syntax tree, should your code wish to ignore the error and continue anyway)

In LES3 I plan to raise the precedence so that x & 7 == 0 has the expected meaning (there will still be two operators, && and and, whose precedence remains below the comparison operators.) The expression will still cause an error, but the error could be filtered out by a compiler based on LES3.

The C operators << >> are often used for multiplication and division by a power of two, so it makes sense that they should have a precedence above + -, rather than below as in C. I've assigned a precedence to them in between + - and * / %, and marked them as immiscible with + -.

More controversially, in the current precedence table the left-hand side of the assignment operators (=, += etc) are raised just above &&, ||, ? and :. So, an expression like foo != null && positive := x > 0 && y > 0 will parse like (foo != null) && (positive := (x > 0 && y > 0)). The traditional parsing behavior may be easier to explain since it can be written on a linear precedence table, but on the other hand, the traditional interpretation (foo != null && positive) := (x > 0 && y > 0) is meaningless. Indeed, my original plan was to raise the precedence quite high on the left side, because by similar reasoning it is useless to parse w + x = y + z as (w + x) = (y + z) so it may as well be interpreted as w + (x = y + z). But then I realized that actually (x + v) = (y + z) might be a reasonable parse in some contexts. Consider an equation solver: you could write, for example, .solvefor x in x + v = y + z which should parse as .solvefor (x in ((x + v) = (y + z))).

Perhaps ideally the precedence of = would be raised on both sides, but this would create an incompatibility with many languages including C, C++, C#, and Java. It could be argued that really I should just leave = exactly the same as it is in C/C++/C#. When you want high-precedence assignment, your language can define a quick binding operator instead. Any opinions?

C# defines the null coalescing operator ??. It has a low precedence like && and ||, which doesn't seem useful to me - I think y ?? x != null is most usefully parsed as (y ?? x) != null; y ?? (x != null) can make sense only for nullable-booleans - so I've raised its precedence above comparison. OTOH a low precedence would seem less wrong in languages where the if statement can implicitly cast references to bool...

Arrow operators

Like LES, the Nim language chooses precedence automatically for user-defined operators. Nim decided that "arrow-like" operators, like ->, =>, and ~> (it's unclear if <- and <~ are included) would have the same ultra-low precedence.

However, I decided it would be more useful if different arrows had different precedences. I decided that...

-> <- would have a semi-low precedence between comparisons and && || (it would no longer have high precedence as in C's pointer arrow x->y - for pointer indirection in LES, I would suggest *. instead). A traditional pseudocode notation has been x <- expr for assignment; I have proposed actually using it as a "slide" operator. The very high precedence used in C is, in any case, unsuitable for Haskell-style function type expressions like Int*Int -> String|Null -> String. I can think of reasons to suggest a lower or higher precedence than I did, so I think it's in a good place. These operators are right-associative, which is appropriate both for function types and for "slide" expressions.
:> and <: would have the same precedence as : (i.e. slightly higher than =), right-associative.
~>, <~, |> and <| would all have very low precedence but different associativities, such that a ~> b ~> c |> y <~ x |> q would mean ( (a ~> (b ~> c)) |> (y <~ z) ) |> q. Based on this decision, I used <~ for marking named operators (In C# foo: expr is used for named operators, but in LES3 this doesn't work since foo: Foo is used to create a variable of type Foo so we need something else.)

I think I should change the squiggly arrows in order to make arrow precedence easier to remember.

You see, LES will have a binary ~ operator. I saw this operator in D, where it represents string concatenation. If we assume it's used the same way in LES, the logical precedence for it is below + - so that N+1 ~ " bytes" could be an expression that calculates N+1, converts it to a string and concatenates it with " bytes" (however at present the precedence is Other meaning that the precedence is somewhat undecided - it's above comparison, below exponentiation and immiscible with anything in between.)

It should be easier to remember if ~> and <~ had the same precedence and associativity as ~ (left associative), so I'm inclined to redefine them that way. This creates a small pattern: ~> <~ shares a precedence with ~, and :> <: shares a precedence with :. But it's a very limited pattern; <- -> aren't based on - and <| |> aren't based on |. Two kinds of arrows are right-associative and two are left-associative.

I created the triangle operators |> <| because I thought there should be something with a lower precedence than word operators, to give DSL authors flexibility in a pinch. I think the precedence here is different from mathematical triangle operators ◁ ▷ like antijoin but I think a super-low precedence op is useful to have available so I'm not inclined to raise its precedence.

Edit: Note: F# has |> and <| operators. The precedence is undocumented but based on examples the precedence is higher than =.

Arguably, its precedence should be even lower than => but currently it's above. I can't make a decision without a use case in mind (edit: I've tentatively made it the only operator whose precedence is below the right side of =>. A potential use case is matchCode, where you want an arbitrary expression on the left and a handler on the right: .matchCode c { $x => $e |> doSomethingWith(x, e) }. This is not a compelling use case though.)

The named-argument operator should have a very low precedence; raising the precedence of <~ makes it unsuitable. <: is appealing to me but its precedence is still too high, unless the left-side precedence of = is kept high enough that arg<: x = y parses as arg<: (x = y). Other candidates are <| and ::=.

Range operators

Ranges like 0..N are conventionally used for slicing and looping, i.e. list[1..list.Length] and .for i in 1..list.Length {...}

But what should its precedence be? There are two obvious choices. One is a high precedence (above * /), so that you can add two ranges (intervals) with a..b + x..y ((a..b) + (x..y)) - yeah, interval arithmetic is a thing, though apparently this is not its usual notation.

The other choice is a low precedence, so that x+1..y means (x+1)..y. For reference, Swift made this latter choice.

The second choice creates a tension in autoformatting IDEs, involving spaces. Normally, only the high-precedence operators don't have spaces around them: A.B, A::B, A(B), A++. But ranges look most natural without spaces around the "..". And Visual Studio typically puts spaces around * / + -. So if you write A+B..C, a naive IDE would want to change it to A + B..C, which users can be expected to mentally parse as A + (B..C). That's bad if (A + B)..C is the correct interpretation!

C# apparently hasn't finalized their decision yet but is leaning toward's Swift decision. So tentatively I'm using the precedence from Swift.

Safe-access operators (null-dot)

In LES2 I gave ?. a precedence slightly below . so that a.b?.c.d parses as (a.b)?.(c.d). The reason for this is as follows:

a.b?.c conventionally means "null if a.b is null, otherwise get a.b.c
It is useless to access c safely only to blithely access d when a.b is null. Therefore the only reasonable interpretation of a.b?.c.d is something like .if (a.b == null) {null} else {a.b.c.d}.
If ?. has a lower precedence than . then it is straightforward to implement the necessary transformation with a macro. But if ?. has the same precedence as . then a.b?.c.d parses as ((a.b)?.c).d, so a macro that hooks onto ?. cannot prevent the access to .d when a.b is null.

But I overlooked something: indexers. In C# you can write my.list?[i].foo which will access the list only if my.list is not null. What should the syntax tree for this look like? By the logic above, we don't want the tree to be `'_?[]`(my.list, i).foo. But what should it be instead?

After a moment's thought, it occurs to me that we don't really need to support that. Users could just write my.list?.[i].foo, which by existing logic will parse as (my.list)?.([i].foo) and then the macro can notice the square brackets and do something special with them. (I haven't got around to supporting this in EC# yet... it would be a bit weird for my.list?[i].Foo to parse as (my.list)?.('[](i).Foo)) just so the same macro would work on it... so I guess I should check what Roslyn does and take a different path for EC# than LES3.)

Nonascii operators

LES currently won't understand nonascii characters at all. I've had trouble locating the information from the Unicode consortium that I would need to figure out how to implement support for them.

Other operators?

Perhaps there should be another comparison operator =~, which is used in some languages to mean "matches regular expression"?

I decided to give the : prefix operator the highest possible precedence, based on the way it is used in Ruby to mark symbols. Unless someone can think of a reason it should have some other precedence.

Any other suggestions? Bueller? @jonathanvdc? Please post them here!

Complete operator table

Since LES is designed to fit a wide range of purposes, it has more predefined operators and precedence levels than most languages. Here's an extended table with additional operators -prefix operators $ : >, infix operators ~ ?? & | ^ ? : :> |> <| WORDS words.

Modified based on later decisions

Type	Operator
prefix	`$ :`
infix/sufix	`. fn() arr[] x++ x-- x!! =:`
infix	`?.`
infix	`$` (tentative based on #74)
prefix	`-x +x *x &x !x`
right-assoc	`a**b`
infix	`* / %`
infix	`>> <<` and `x WORD y` (see note)
infix	`+ -`
infix	`.. .<` (highly tentative, see above; includes `... ..<` implicitly)
prefix	`..` `.<` (highly tentative, see above; includes `... ..<` implicitly)
infix	`~ ~>` (tentative, see above; includes `<~` implicitly)
prefix	`<~ ~>` (tentative, see below)
infix	`??` (tentative, see above)
infix	`== != > < >= <= <>`
right-assoc	`<- ->` (tentative, see above)
prefix	`<- ->` (tentative, see below)
infix	`&&`
infix	`\\|\\| ^^` [should render without backslashes]
right-assoc	`? : :>` (includes `<:` implicitly)
prefix	`<: :>` (tentative, see below)
right-assoc	`=` (includes compound assignment implicitly)
right-assoc	`x word y` (see note)
infix	`=>` (see note)
prefix	`=> >` (tentative based on #86)
infix	`<\\| \\|>` [should render without backslashes]
prefix	`<\\| \\|>` (tentative, see below)

Notes:

Uppercase word operators like x MOD y are planned to be immiscible with high-precedence operators between (but not including) comparison operators up to exponentiation (**). A rationale for this is to slightly decrease how much people reading the code have to remember, by forcing the code-writer to use parentheses in many cases.
LES also supports lowercase word operators like Harry met Sally and combo operators like sum approx== 0; the precedence of a combo operator is based on the punctuation alone.
The lambda operator => has a higher precedence on its left side, as mentioned above.

qwertie commented 5 years ago

Hmm, could we change the precedence of =>?

The issue is that I'd like to write functions like

.fn Square(x: int|float) -> int|float => x*x;

But the precedence of => is too high on the left side. I think the current precedence comes from just copying the parsing behavior of C# expressions, and that it would be somewhat safe to lower the precedence to be below -> and <-. Not entirely so, since certain plausible expressions like a ?? x => x would change meaning. I'm inclined to be a bit conservative and not change the meaning of c ? f : x => x. Thus the new precedence would remain higher on the left side than the right side.

Another attractive function syntax is

.fn Square(x: int|float): int|float => x*x;

This can be permitted after I change the precedence of =>, but note that I'm proposing it to have a different structure (a: (b => c) vs (a -> b) => c).

qwertie commented 5 years ago

Having introduced a prefix operator > that I anticipate using as a shorthand-lambda, I'm inclined to introduce prefix versions of all arrow operators (~> <~ -> <- <: :> |> <|), each with a precedence based on the corresponding arrow; e.g. since A && B == C -> C == D && E means A && ((B == C) -> (C == D)) && E, A && -> C == D && E would mean A && (-> (C == D)) && E.

qwertie commented 5 years ago

I'm inclined to intentionally not recognize < as a prefix operator, so that a possible future extension of LES could involve something that starts with < (XML literals or something totally different.)

Note that ? = are currently not recognized as prefix operators. However % / are currently recognized as prefixes with the same precedence as + - *.

qwertie commented 5 years ago

Edit: no, . was already not accepted as a prefix operator (though LES2 and EC# both have a prefix .).

Currently, . is a prefix operator, but this makes it more difficult to print LES3 code correctly since the printer has to be careful not to print something like `'.`(foo) + 1 as .foo + 1 which would then be parsed as #foo(+1) (it must print . foo + 1 instead). It's probably for the best if I ban it as a prefix operator.

qwertie commented 5 years ago

Regarding lambda => and ??, I decided to lower the precedence of ?? to be just above the comparison operators (previously it was just above &), and then to re-raise the precedence of => on its left side, to make it below & and above ?? (it was briefly below <- and above &&). In LES3 this arrangement seems good because

You can write foo ?? bar => expr to mean foo ?? (bar => expr) which is more useful than the alternative (assuming one uses => for lambdas and ?? for null-coalescing).
You can define functions either with foo(args) -> A | B => expr or foo(args): A | B => expr, obtaining a similar tree for both: foo(args) -> ((A | B) => expr) and foo(args): ((A | B) => expr).

If => has the lower left-side precedence,

The first expression would mean (foo ?? bar) => expr which seems non-useful.
The expression foo(args) -> A | B => expr would have the structure (foo(args) -> (A | B)) => expr which seems more logical to me, but denoting a return type with : is also popular and would still have the structure foo(args): ((A | B) => expr)

The higher left-precedence of => does have a disadvantage in LES2, however. In LES2 the precedence of & | ^ are illogically lower to match C/C++/C#/Java, which means that "union types" like int|string would need to be written in parentheses to parse correctly in a return-type context: foo(args): (A | B) => expr. I can accept this problem because LES3 is the recommended syntax for creating new programming languages (another option is to use a lower left-precedence of => in LES2 to account for this, but that option would cause an inconsistency when parsing foo ?? bar => expr.)

Lowering the precedence of ?? has a minor disadvantage of putting it in the immiscibility range of ? | ^, so an expression like foo ?? bar & 1 will be treated as an error (while producing the tree foo ?? (bar & 1)). [Edit: actually I think it's easy to avoid this immiscibility, but seems simpler to explain the immiscibility concept if there are no obvious exceptions to the rule. Then again, perhaps the rule itself could be reformulated in a different way that doesn't involve exceptions..]

qwertie commented 4 years ago

C# has finalized their precedence for .. and it is different from Swift: a .. b * c means (a..b)*c in C# but a..(b*c) in Swift. I've decided that it is better to go the C# route because, as I mentioned, A..B without spaces is the most natural format for ranges, but if its precedence is low, we would prefer not to print A..B+1 as A..B + 1 since it would give an impression of (A..B) + 1 as the meaning. This is not a problem if (A..B) + 1 actually is the meaning.

As of 27.x.x the precedence of .. and ..< have a precedence matching Swift. In 28.0 I'll change them match C# (even though C# doesn't have a ..< operator).

qwertie commented 3 years ago

Looks like I left out a definition of <> in the source code. I've decided to give it a precedence slightly higher than other comparison operators because this operator is often used in "spaceship" form (<=>) and it gets a slightly higher precedence in C++... though I notice that in Ruby, greater/less operators actually get a higher precedence than equality/inequality and spaceship operators, so I wonder if that's the way to go, except I don't see the rationale of Ruby's approach.

In any case, in LES I could imagine people using the diamond operator <> itself for something other than comparison, in which case a different precedence is likely better.

qwertie / ecsharp