qwertie / ecsharp

Home of LoycCore, the LES language of Loyc trees, the Enhanced C# parser, the LeMP macro preprocessor, and the LLLPG parser generator.
http://ecsharp.net
Other
172 stars 25 forks source link

Let's finalize the LES3 precedence table #87

Open qwertie opened 5 years ago

qwertie commented 5 years ago

It's almost time to decide a final precedence table. Some operators are obvious:

Type Operator
infix Primary . fn() arr[] x++ x--
prefix Prefix -x +x *x &x !x
right-assoc Exponentiation a**b
infix MulDiv * / %
infix AddSub + -
infix Comparison == != > < >= <=
infix And &&
infix Or \|\|
infix Assign =

Others are... not so obvious as they may first appear.

First, let's briefly review a few general properties of LES operators:

Here are some topics to consider (see proposed precedence table at the end):

Fixing past precedence mistakes

In C, you might try to write x & 7 == 0 to check if the bottom 3 bits of x are zero; this doesn't work, of course, because it parses as x & (7 == 0) so it will actually check if the least-significant bit is one.

LES2 keeps the same precedence used in C but makes & and | and ^ immiscible with comparison operators so that x & 7 == 0 produces an error (though it also produces a syntax tree, should your code wish to ignore the error and continue anyway)

In LES3 I plan to raise the precedence so that x & 7 == 0 has the expected meaning (there will still be two operators, && and and, whose precedence remains below the comparison operators.) The expression will still cause an error, but the error could be filtered out by a compiler based on LES3.

The C operators << >> are often used for multiplication and division by a power of two, so it makes sense that they should have a precedence above + -, rather than below as in C. I've assigned a precedence to them in between + - and * / %, and marked them as immiscible with + -.

More controversially, in the current precedence table the left-hand side of the assignment operators (=, += etc) are raised just above &&, ||, ? and :. So, an expression like foo != null && positive := x > 0 && y > 0 will parse like (foo != null) && (positive := (x > 0 && y > 0)). The traditional parsing behavior may be easier to explain since it can be written on a linear precedence table, but on the other hand, the traditional interpretation (foo != null && positive) := (x > 0 && y > 0) is meaningless. Indeed, my original plan was to raise the precedence quite high on the left side, because by similar reasoning it is useless to parse w + x = y + z as (w + x) = (y + z) so it may as well be interpreted as w + (x = y + z). But then I realized that actually (x + v) = (y + z) might be a reasonable parse in some contexts. Consider an equation solver: you could write, for example, .solvefor x in x + v = y + z which should parse as .solvefor (x in ((x + v) = (y + z))).

Perhaps ideally the precedence of = would be raised on both sides, but this would create an incompatibility with many languages including C, C++, C#, and Java. It could be argued that really I should just leave = exactly the same as it is in C/C++/C#. When you want high-precedence assignment, your language can define a quick binding operator instead. Any opinions?

C# defines the null coalescing operator ??. It has a low precedence like && and ||, which doesn't seem useful to me - I think y ?? x != null is most usefully parsed as (y ?? x) != null; y ?? (x != null) can make sense only for nullable-booleans - so I've raised its precedence above comparison. OTOH a low precedence would seem less wrong in languages where the if statement can implicitly cast references to bool...

Arrow operators

Like LES, the Nim language chooses precedence automatically for user-defined operators. Nim decided that "arrow-like" operators, like ->, =>, and ~> (it's unclear if <- and <~ are included) would have the same ultra-low precedence.

However, I decided it would be more useful if different arrows had different precedences. I decided that...

I think I should change the squiggly arrows in order to make arrow precedence easier to remember.

You see, LES will have a binary ~ operator. I saw this operator in D, where it represents string concatenation. If we assume it's used the same way in LES, the logical precedence for it is below + - so that N+1 ~ " bytes" could be an expression that calculates N+1, converts it to a string and concatenates it with " bytes" (however at present the precedence is Other meaning that the precedence is somewhat undecided - it's above comparison, below exponentiation and immiscible with anything in between.)

It should be easier to remember if ~> and <~ had the same precedence and associativity as ~ (left associative), so I'm inclined to redefine them that way. This creates a small pattern: ~> <~ shares a precedence with ~, and :> <: shares a precedence with :. But it's a very limited pattern; <- -> aren't based on - and <| |> aren't based on |. Two kinds of arrows are right-associative and two are left-associative.

I created the triangle operators |> <| because I thought there should be something with a lower precedence than word operators, to give DSL authors flexibility in a pinch. I think the precedence here is different from mathematical triangle operators ◁ ▷ like antijoin but I think a super-low precedence op is useful to have available so I'm not inclined to raise its precedence.

Edit: Note: F# has |> and <| operators. The precedence is undocumented but based on examples the precedence is higher than =.

Arguably, its precedence should be even lower than => but currently it's above. I can't make a decision without a use case in mind (edit: I've tentatively made it the only operator whose precedence is below the right side of =>. A potential use case is matchCode, where you want an arbitrary expression on the left and a handler on the right: .matchCode c { $x => $e |> doSomethingWith(x, e) }. This is not a compelling use case though.)

The named-argument operator should have a very low precedence; raising the precedence of <~ makes it unsuitable. <: is appealing to me but its precedence is still too high, unless the left-side precedence of = is kept high enough that arg<: x = y parses as arg<: (x = y). Other candidates are <| and ::=.

Range operators

Ranges like 0..N are conventionally used for slicing and looping, i.e. list[1..list.Length] and .for i in 1..list.Length {...}

But what should its precedence be? There are two obvious choices. One is a high precedence (above * /), so that you can add two ranges (intervals) with a..b + x..y ((a..b) + (x..y)) - yeah, interval arithmetic is a thing, though apparently this is not its usual notation.

The other choice is a low precedence, so that x+1..y means (x+1)..y. For reference, Swift made this latter choice.

The second choice creates a tension in autoformatting IDEs, involving spaces. Normally, only the high-precedence operators don't have spaces around them: A.B, A::B, A(B), A++. But ranges look most natural without spaces around the "..". And Visual Studio typically puts spaces around * / + -. So if you write A+B..C, a naive IDE would want to change it to A + B..C, which users can be expected to mentally parse as A + (B..C). That's bad if (A + B)..C is the correct interpretation!

C# apparently hasn't finalized their decision yet but is leaning toward's Swift decision. So tentatively I'm using the precedence from Swift.

Safe-access operators (null-dot)

In LES2 I gave ?. a precedence slightly below . so that a.b?.c.d parses as (a.b)?.(c.d). The reason for this is as follows:

  1. a.b?.c conventionally means "null if a.b is null, otherwise get a.b.c
  2. It is useless to access c safely only to blithely access d when a.b is null. Therefore the only reasonable interpretation of a.b?.c.d is something like .if (a.b == null) {null} else {a.b.c.d}.
  3. If ?. has a lower precedence than . then it is straightforward to implement the necessary transformation with a macro. But if ?. has the same precedence as . then a.b?.c.d parses as ((a.b)?.c).d, so a macro that hooks onto ?. cannot prevent the access to .d when a.b is null.

But I overlooked something: indexers. In C# you can write my.list?[i].foo which will access the list only if my.list is not null. What should the syntax tree for this look like? By the logic above, we don't want the tree to be `'_?[]`(my.list, i).foo. But what should it be instead?

After a moment's thought, it occurs to me that we don't really need to support that. Users could just write my.list?.[i].foo, which by existing logic will parse as (my.list)?.([i].foo) and then the macro can notice the square brackets and do something special with them. (I haven't got around to supporting this in EC# yet... it would be a bit weird for my.list?[i].Foo to parse as (my.list)?.('[](i).Foo)) just so the same macro would work on it... so I guess I should check what Roslyn does and take a different path for EC# than LES3.)

Nonascii operators

LES currently won't understand nonascii characters at all. I've had trouble locating the information from the Unicode consortium that I would need to figure out how to implement support for them.

Other operators?

Perhaps there should be another comparison operator =~, which is used in some languages to mean "matches regular expression"?

I decided to give the : prefix operator the highest possible precedence, based on the way it is used in Ruby to mark symbols. Unless someone can think of a reason it should have some other precedence.

Any other suggestions? Bueller? @jonathanvdc? Please post them here!

Complete operator table

Since LES is designed to fit a wide range of purposes, it has more predefined operators and precedence levels than most languages. Here's an extended table with additional operators -prefix operators $ : >, infix operators ~ ?? & | ^ ? : :> |> <| WORDS words.

Modified based on later decisions

Type Operator
prefix $ :
infix/sufix . fn() arr[] x++ x-- x!! =:
infix ?.
infix $ (tentative based on #74)
prefix -x +x *x &x !x
right-assoc a**b
infix * / %
infix >> << and x WORD y (see note)
infix + -
infix .. .< (highly tentative, see above; includes ... ..< implicitly)
prefix .. .< (highly tentative, see above; includes ... ..< implicitly)
infix ~ ~> (tentative, see above; includes <~ implicitly)
prefix <~ ~> (tentative, see below)
infix ?? (tentative, see above)
infix == != > < >= <= <>
right-assoc <- -> (tentative, see above)
prefix <- -> (tentative, see below)
infix &&
infix \|\| ^^ [should render without backslashes]
right-assoc ? : :> (includes <: implicitly)
prefix <: :> (tentative, see below)
right-assoc = (includes compound assignment implicitly)
right-assoc x word y (see note)
infix => (see note)
prefix => > (tentative based on #86)
infix <\| \|> [should render without backslashes]
prefix <\| \|> (tentative, see below)

Notes:

qwertie commented 5 years ago

Hmm, could we change the precedence of =>?

The issue is that I'd like to write functions like

.fn Square(x: int|float) -> int|float => x*x;

But the precedence of => is too high on the left side. I think the current precedence comes from just copying the parsing behavior of C# expressions, and that it would be somewhat safe to lower the precedence to be below -> and <-. Not entirely so, since certain plausible expressions like a ?? x => x would change meaning. I'm inclined to be a bit conservative and not change the meaning of c ? f : x => x. Thus the new precedence would remain higher on the left side than the right side.

Another attractive function syntax is

.fn Square(x: int|float): int|float => x*x;

This can be permitted after I change the precedence of =>, but note that I'm proposing it to have a different structure (a: (b => c) vs (a -> b) => c).

qwertie commented 5 years ago

Having introduced a prefix operator > that I anticipate using as a shorthand-lambda, I'm inclined to introduce prefix versions of all arrow operators (~> <~ -> <- <: :> |> <|), each with a precedence based on the corresponding arrow; e.g. since A && B == C -> C == D && E means A && ((B == C) -> (C == D)) && E, A && -> C == D && E would mean A && (-> (C == D)) && E.

qwertie commented 5 years ago

I'm inclined to intentionally not recognize < as a prefix operator, so that a possible future extension of LES could involve something that starts with < (XML literals or something totally different.)

Note that ? = are currently not recognized as prefix operators. However % / are currently recognized as prefixes with the same precedence as + - *.

qwertie commented 5 years ago

Edit: no, . was already not accepted as a prefix operator (though LES2 and EC# both have a prefix .).

Currently, . is a prefix operator, but this makes it more difficult to print LES3 code correctly since the printer has to be careful not to print something like `'.`(foo) + 1 as .foo + 1 which would then be parsed as #foo(+1) (it must print . foo + 1 instead). It's probably for the best if I ban it as a prefix operator.

qwertie commented 5 years ago

Regarding lambda => and ??, I decided to lower the precedence of ?? to be just above the comparison operators (previously it was just above &), and then to re-raise the precedence of => on its left side, to make it below & and above ?? (it was briefly below <- and above &&). In LES3 this arrangement seems good because

If => has the lower left-side precedence,

The higher left-precedence of => does have a disadvantage in LES2, however. In LES2 the precedence of & | ^ are illogically lower to match C/C++/C#/Java, which means that "union types" like int|string would need to be written in parentheses to parse correctly in a return-type context: foo(args): (A | B) => expr. I can accept this problem because LES3 is the recommended syntax for creating new programming languages (another option is to use a lower left-precedence of => in LES2 to account for this, but that option would cause an inconsistency when parsing foo ?? bar => expr.)

Lowering the precedence of ?? has a minor disadvantage of putting it in the immiscibility range of ? | ^, so an expression like foo ?? bar & 1 will be treated as an error (while producing the tree foo ?? (bar & 1)). [Edit: actually I think it's easy to avoid this immiscibility, but seems simpler to explain the immiscibility concept if there are no obvious exceptions to the rule. Then again, perhaps the rule itself could be reformulated in a different way that doesn't involve exceptions..]

qwertie commented 4 years ago

C# has finalized their precedence for .. and it is different from Swift: a .. b * c means (a..b)*c in C# but a..(b*c) in Swift. I've decided that it is better to go the C# route because, as I mentioned, A..B without spaces is the most natural format for ranges, but if its precedence is low, we would prefer not to print A..B+1 as A..B + 1 since it would give an impression of (A..B) + 1 as the meaning. This is not a problem if (A..B) + 1 actually is the meaning.

As of 27.x.x the precedence of .. and ..< have a precedence matching Swift. In 28.0 I'll change them match C# (even though C# doesn't have a ..< operator).

qwertie commented 3 years ago

Looks like I left out a definition of <> in the source code. I've decided to give it a precedence slightly higher than other comparison operators because this operator is often used in "spaceship" form (<=>) and it gets a slightly higher precedence in C++... though I notice that in Ruby, greater/less operators actually get a higher precedence than equality/inequality and spaceship operators, so I wonder if that's the way to go, except I don't see the rationale of Ruby's approach.

In any case, in LES I could imagine people using the diamond operator <> itself for something other than comparison, in which case a different precedence is likely better.