Open qwertie opened 5 years ago
Hmm, could we change the precedence of =>
?
The issue is that I'd like to write functions like
.fn Square(x: int|float) -> int|float => x*x;
But the precedence of =>
is too high on the left side. I think the current precedence comes from just copying the parsing behavior of C# expressions, and that it would be somewhat safe to lower the precedence to be below ->
and <-
. Not entirely so, since certain plausible expressions like a ?? x => x
would change meaning. I'm inclined to be a bit conservative and not change the meaning of c ? f : x => x
. Thus the new precedence would remain higher on the left side than the right side.
Another attractive function syntax is
.fn Square(x: int|float): int|float => x*x;
This can be permitted after I change the precedence of =>
, but note that I'm proposing it to have a different structure (a: (b => c)
vs (a -> b) => c
).
Having introduced a prefix operator >
that I anticipate using as a shorthand-lambda, I'm inclined to introduce prefix versions of all arrow operators (~> <~ -> <- <: :> |> <|
), each with a precedence based on the corresponding arrow; e.g. since A && B == C -> C == D && E
means A && ((B == C) -> (C == D)) && E
, A && -> C == D && E
would mean A && (-> (C == D)) && E
.
I'm inclined to intentionally not recognize <
as a prefix operator, so that a possible future extension of LES could involve something that starts with <
(XML literals or something totally different.)
Note that ? =
are currently not recognized as prefix operators. However % /
are currently recognized as prefixes with the same precedence as + - *
.
Edit: no, .
was already not accepted as a prefix operator (though LES2 and EC# both have a prefix .
).
Currently, .
is a prefix operator, but this makes it more difficult to print LES3 code correctly since the printer has to be careful not to print something like `'.`(foo) + 1
as .foo + 1
which would then be parsed as #foo(+1)
(it must print . foo + 1
instead). It's probably for the best if I ban it as a prefix operator.
Regarding lambda =>
and ??
, I decided to lower the precedence of ??
to be just above the comparison operators (previously it was just above &
), and then to re-raise the precedence of =>
on its left side, to make it below &
and above ??
(it was briefly below <-
and above &&
). In LES3 this arrangement seems good because
foo ?? bar => expr
to mean foo ?? (bar => expr)
which is more useful than the alternative (assuming one uses =>
for lambdas and ??
for null-coalescing).foo(args) -> A | B => expr
or foo(args): A | B => expr
, obtaining a similar tree for both: foo(args) -> ((A | B) => expr)
and foo(args): ((A | B) => expr)
.If =>
has the lower left-side precedence,
(foo ?? bar) => expr
which seems non-useful.foo(args) -> A | B => expr
would have the structure (foo(args) -> (A | B)) => expr
which seems more logical to me, but denoting a return type with :
is also popular and would still have the structure foo(args): ((A | B) => expr)
The higher left-precedence of =>
does have a disadvantage in LES2, however. In LES2 the precedence of & | ^
are illogically lower to match C/C++/C#/Java, which means that "union types" like int|string
would need to be written in parentheses to parse correctly in a return-type context: foo(args): (A | B) => expr
. I can accept this problem because LES3 is the recommended syntax for creating new programming languages (another option is to use a lower left-precedence of =>
in LES2 to account for this, but that option would cause an inconsistency when parsing foo ?? bar => expr
.)
Lowering the precedence of ??
has a minor disadvantage of putting it in the immiscibility range of ? | ^
, so an expression like foo ?? bar & 1
will be treated as an error (while producing the tree foo ?? (bar & 1)
). [Edit: actually I think it's easy to avoid this immiscibility, but seems simpler to explain the immiscibility concept if there are no obvious exceptions to the rule. Then again, perhaps the rule itself could be reformulated in a different way that doesn't involve exceptions..]
C# has finalized their precedence for ..
and it is different from Swift: a .. b * c
means (a..b)*c
in C# but a..(b*c)
in Swift. I've decided that it is better to go the C# route because, as I mentioned, A..B
without spaces is the most natural format for ranges, but if its precedence is low, we would prefer not to print A..B+1
as A..B + 1
since it would give an impression of (A..B) + 1
as the meaning. This is not a problem if (A..B) + 1
actually is the meaning.
As of 27.x.x the precedence of ..
and ..<
have a precedence matching Swift. In 28.0 I'll change them match C# (even though C# doesn't have a ..<
operator).
Looks like I left out a definition of <>
in the source code. I've decided to give it a precedence slightly higher than other comparison operators because this operator is often used in "spaceship" form (<=>
) and it gets a slightly higher precedence in C++... though I notice that in Ruby, greater/less operators actually get a higher precedence than equality/inequality and spaceship operators, so I wonder if that's the way to go, except I don't see the rationale of Ruby's approach.
In any case, in LES I could imagine people using the diamond operator <>
itself for something other than comparison, in which case a different precedence is likely better.
It's almost time to decide a final precedence table. Some operators are obvious:
. fn() arr[] x++ x--
-x +x *x &x !x
a**b
* / %
+ -
== != > < >= <=
&&
\|\|
=
Others are... not so obvious as they may first appear.
First, let's briefly review a few general properties of LES operators:
=~=
operator defined, so its precedence will be the same as==
. There is no&+
operator, so its precedence will be the same as+
.~ ! % ^ * - + = | < > / ? : . &
$
is an operator character but it is parsed specially. In LES2$
is allowed only as the first character of an operator; in LES3 I'm considering a further restriction, that$
cannot combine with anything else (motivating example in #74:Console.WriteLine$$"Hello {name}!"
).\
as an operator character; it has no meaning and is reserved for future use, except that double-\ can be used to end a single-line comment before the end of the line. Quote characters' ` "
or the number sign#
don't count as operator characters either (the latter is an identifier character)=>
:x == y => a == b
meansx == (y => (a == b))
- the precedence is high on the left and low on the right. In the system of precedence I'm using, if the left and right side have the same precedence then the operator is left-associative, and if the left side has slightly higher precedence, it makes the operator right-associative.& | ^ ::
) LES2 and LES3 will use the same precedence table so any changes decided here will also apply to LES2.Here are some topics to consider (see proposed precedence table at the end):
Fixing past precedence mistakes
In C, you might try to write
x & 7 == 0
to check if the bottom 3 bits ofx
are zero; this doesn't work, of course, because it parses asx & (7 == 0)
so it will actually check if the least-significant bit is one.LES2 keeps the same precedence used in C but makes
&
and|
and^
immiscible with comparison operators so thatx & 7 == 0
produces an error (though it also produces a syntax tree, should your code wish to ignore the error and continue anyway)In LES3 I plan to raise the precedence so that
x & 7 == 0
has the expected meaning (there will still be two operators,&&
andand
, whose precedence remains below the comparison operators.) The expression will still cause an error, but the error could be filtered out by a compiler based on LES3.The C operators
<< >>
are often used for multiplication and division by a power of two, so it makes sense that they should have a precedence above+ -
, rather than below as in C. I've assigned a precedence to them in between+ -
and* / %
, and marked them as immiscible with+ -
.More controversially, in the current precedence table the left-hand side of the assignment operators (
=
,+=
etc) are raised just above&&
,||
,?
and:
. So, an expression likefoo != null && positive := x > 0 && y > 0
will parse like(foo != null) && (positive := (x > 0 && y > 0))
. The traditional parsing behavior may be easier to explain since it can be written on a linear precedence table, but on the other hand, the traditional interpretation(foo != null && positive) := (x > 0 && y > 0)
is meaningless. Indeed, my original plan was to raise the precedence quite high on the left side, because by similar reasoning it is useless to parsew + x = y + z
as(w + x) = (y + z)
so it may as well be interpreted asw + (x = y + z)
. But then I realized that actually(x + v) = (y + z)
might be a reasonable parse in some contexts. Consider an equation solver: you could write, for example,.solvefor x in x + v = y + z
which should parse as.solvefor (x in ((x + v) = (y + z)))
.Perhaps ideally the precedence of
=
would be raised on both sides, but this would create an incompatibility with many languages including C, C++, C#, and Java. It could be argued that really I should just leave=
exactly the same as it is in C/C++/C#. When you want high-precedence assignment, your language can define a quick binding operator instead. Any opinions?C# defines the null coalescing operator
??
. It has a low precedence like&&
and||
, which doesn't seem useful to me - I thinky ?? x != null
is most usefully parsed as(y ?? x) != null
;y ?? (x != null)
can make sense only for nullable-booleans - so I've raised its precedence above comparison. OTOH a low precedence would seem less wrong in languages where theif
statement can implicitly cast references to bool...Arrow operators
Like LES, the Nim language chooses precedence automatically for user-defined operators. Nim decided that "arrow-like" operators, like
->
,=>
, and~>
(it's unclear if<-
and<~
are included) would have the same ultra-low precedence.However, I decided it would be more useful if different arrows had different precedences. I decided that...
-> <-
would have a semi-low precedence between comparisons and&& ||
(it would no longer have high precedence as in C's pointer arrowx->y
- for pointer indirection in LES, I would suggest*.
instead). A traditional pseudocode notation has beenx <- expr
for assignment; I have proposed actually using it as a "slide" operator. The very high precedence used in C is, in any case, unsuitable for Haskell-style function type expressions likeInt*Int -> String|Null -> String
. I can think of reasons to suggest a lower or higher precedence than I did, so I think it's in a good place. These operators are right-associative, which is appropriate both for function types and for "slide" expressions.:>
and<:
would have the same precedence as:
(i.e. slightly higher than=
), right-associative.~>
,<~
,|>
and<|
would all have very low precedence but different associativities, such thata ~> b ~> c |> y <~ x |> q
would mean( (a ~> (b ~> c)) |> (y <~ z) ) |> q
. Based on this decision, I used<~
for marking named operators (In C#foo: expr
is used for named operators, but in LES3 this doesn't work sincefoo: Foo
is used to create a variable of typeFoo
so we need something else.)I think I should change the squiggly arrows in order to make arrow precedence easier to remember.
You see, LES will have a binary
~
operator. I saw this operator in D, where it represents string concatenation. If we assume it's used the same way in LES, the logical precedence for it is below+ -
so thatN+1 ~ " bytes"
could be an expression that calculatesN+1
, converts it to a string and concatenates it with" bytes"
(however at present the precedence isOther
meaning that the precedence is somewhat undecided - it's above comparison, below exponentiation and immiscible with anything in between.)It should be easier to remember if
~>
and<~
had the same precedence and associativity as~
(left associative), so I'm inclined to redefine them that way. This creates a small pattern:~> <~
shares a precedence with~
, and:> <:
shares a precedence with:
. But it's a very limited pattern;<- ->
aren't based on-
and<| |>
aren't based on|
. Two kinds of arrows are right-associative and two are left-associative.I created the triangle operators
|> <|
because I thought there should be something with a lower precedence than word operators, to give DSL authors flexibility in a pinch. I think the precedence here is different from mathematical triangle operators ◁ ▷ like antijoin but I think a super-low precedence op is useful to have available so I'm not inclined to raise its precedence.Edit: Note: F# has
|>
and<|
operators. The precedence is undocumented but based on examples the precedence is higher than=
.Arguably, its precedence should be even lower than
=>
but currently it's above. I can't make a decision without a use case in mind (edit: I've tentatively made it the only operator whose precedence is below the right side of=>
. A potential use case ismatchCode
, where you want an arbitrary expression on the left and a handler on the right:.matchCode c { $x => $e |> doSomethingWith(x, e) }
. This is not a compelling use case though.)The named-argument operator should have a very low precedence; raising the precedence of
<~
makes it unsuitable.<:
is appealing to me but its precedence is still too high, unless the left-side precedence of=
is kept high enough thatarg<: x = y
parses asarg<: (x = y)
. Other candidates are<|
and::=
.Range operators
Ranges like
0..N
are conventionally used for slicing and looping, i.e.list[1..list.Length]
and.for i in 1..list.Length {...}
But what should its precedence be? There are two obvious choices. One is a high precedence (above
* /
), so that you can add two ranges (intervals) witha..b + x..y
((a..b) + (x..y)
) - yeah, interval arithmetic is a thing, though apparently this is not its usual notation.The other choice is a low precedence, so that
x+1..y
means(x+1)..y
. For reference, Swift made this latter choice.The second choice creates a tension in autoformatting IDEs, involving spaces. Normally, only the high-precedence operators don't have spaces around them:
A.B
,A::B
,A(B)
,A++
. But ranges look most natural without spaces around the "..". And Visual Studio typically puts spaces around* / + -
. So if you writeA+B..C
, a naive IDE would want to change it toA + B..C
, which users can be expected to mentally parse asA + (B..C)
. That's bad if(A + B)..C
is the correct interpretation!C# apparently hasn't finalized their decision yet but is leaning toward's Swift decision. So tentatively I'm using the precedence from Swift.
Safe-access operators (null-dot)
In LES2 I gave
?.
a precedence slightly below.
so thata.b?.c.d
parses as(a.b)?.(c.d)
. The reason for this is as follows:a.b?.c
conventionally means "null ifa.b
is null, otherwise geta.b.c
c
safely only to blithely accessd
whena.b
is null. Therefore the only reasonable interpretation ofa.b?.c.d
is something like.if (a.b == null) {null} else {a.b.c.d}
.?.
has a lower precedence than.
then it is straightforward to implement the necessary transformation with a macro. But if?.
has the same precedence as.
thena.b?.c.d
parses as((a.b)?.c).d
, so a macro that hooks onto?.
cannot prevent the access to.d
whena.b
is null.But I overlooked something: indexers. In C# you can write
my.list?[i].foo
which will access the list only ifmy.list
is not null. What should the syntax tree for this look like? By the logic above, we don't want the tree to be`'_?[]`(my.list, i).foo
. But what should it be instead?After a moment's thought, it occurs to me that we don't really need to support that. Users could just write
my.list?.[i].foo
, which by existing logic will parse as(my.list)?.([i].foo)
and then the macro can notice the square brackets and do something special with them. (I haven't got around to supporting this in EC# yet... it would be a bit weird formy.list?[i].Foo
to parse as(my.list)?.(
'[](i).Foo)
) just so the same macro would work on it... so I guess I should check what Roslyn does and take a different path for EC# than LES3.)Nonascii operators
LES currently won't understand nonascii characters at all. I've had trouble locating the information from the Unicode consortium that I would need to figure out how to implement support for them.
Other operators?
Perhaps there should be another comparison operator
=~
, which is used in some languages to mean "matches regular expression"?I decided to give the
:
prefix operator the highest possible precedence, based on the way it is used in Ruby to mark symbols. Unless someone can think of a reason it should have some other precedence.Any other suggestions? Bueller? @jonathanvdc? Please post them here!
Complete operator table
Since LES is designed to fit a wide range of purposes, it has more predefined operators and precedence levels than most languages. Here's an extended table with additional operators -prefix operators
$ : >
, infix operators~ ?? & | ^ ? : :> |> <| WORDS words
.Modified based on later decisions
$ :
. fn() arr[] x++ x-- x!! =:
?.
$
(tentative based on #74)-x +x *x &x !x
a**b
* / %
>> <<
andx WORD y
(see note)+ -
.. .<
(highly tentative, see above; includes... ..<
implicitly)..
.<
(highly tentative, see above; includes... ..<
implicitly)~ ~>
(tentative, see above; includes<~
implicitly)<~ ~>
(tentative, see below)??
(tentative, see above)== != > < >= <= <>
<- ->
(tentative, see above)<- ->
(tentative, see below)&&
\|\| ^^
[should render without backslashes]? : :>
(includes<:
implicitly)<: :>
(tentative, see below)=
(includes compound assignment implicitly)x word y
(see note)=>
(see note)=> >
(tentative based on #86)<\| \|>
[should render without backslashes]<\| \|>
(tentative, see below)Notes:
x MOD y
are planned to be immiscible with high-precedence operators between (but not including) comparison operators up to exponentiation (**
). A rationale for this is to slightly decrease how much people reading the code have to remember, by forcing the code-writer to use parentheses in many cases.Harry met Sally
and combo operators likesum approx== 0
; the precedence of a combo operator is based on the punctuation alone.=>
has a higher precedence on its left side, as mentioned above.