Closed dgrunwald closed 9 years ago
I am planning to fix this myself, but I need input on whether to increase the operator precedence so that r == 1..2
is valid.
cc @aturon
I think this is a candidate for a nomination as well.
cc @nick29581
Nominating.
cc @nikomatsakis @alexcrichton
May be related to https://github.com/rust-lang/rust/issues/20830
@alexcrichton: That looks like a different bug in the same line of code.
I agree about the current precedence being wrong, and think that the suggested position is probably good, but I am very wary of the "restriction no dots" rules. Every restriction is a parser bug until proven otherwise, from my POV, and I am not convinced that we need RESTRICTION_NO_DOTS
at all anymore.
@nikomatsakis: I think some kind of parser special case will always be necessary if we want to allow v[0..a+b]
, avoid grammar inconsistencies between ..a
and a..
, and not have a combinatorial explosion in the complexity of the formal grammar at the same time.
I know this problem from when I tried writing a C# parser using a parser generator. It turns out that the Microsoft C# compiler is a hand-written recursive-descent parser using the shunting yard algorithm to parse binops. But not all C# binops are true binary operators taking two expressions: the as
operator takes a type on the RHS, yet has lower precedence than some other binary operators. This results in some strange effects -- a + b as c + d
is valid in C# and parses as ((a + b) as c) + d
, even though the precedence of as
is lower than that of +
. As far as I know, there is no way to write a formal grammar that accepts the same expressions as the MS C# compiler without duplicating all expression-related productions.
I'd prefer if rust does not make the same mistake as C#, so any prefix- and suffix- operators must be forbidden from being used in a context which conflicts with their operator precedence, even if operator precedence is unnecessary to disambiguate the expression. As far as I know, ..
is currently the only prefix-/suffix- operator where this problem occurs, since as
and all unary operators have a higher precedence than any binary operator.
Here's how I imagine the formal grammar:
BitOrExpr = BitXorExpr | BitOrExpr "^" BitXorExpr
RangeExpr = BitOrExpr | ".." BitOrExpr | BitOrExpr ".." | BitOrExpr ".." BitOrExpr
CompExpr = RangeExpr | RangeExpr ("==" | "!=" | "<" | ">" | "<=" | ">=") RangeExpr
AndExpr = CompExpr | AndExpr "&&" CompExpr
This means that "^" and "&&" are left-associative; comparison operators don't have any associativity (since RFC 558); and the range expression doesn't have any associativity either.
I hope that this can be implemented without using a restrictions
bitflag in the parser. But if not, I'd prefer special cases in the parser over special cases in the grammar.
Changing the precedence level as I suggested actually creates new syntactic ambiguities:
a == b.. && c
currently is... actually a syntax error because can_begin_expr
doesn't handle &&
(#20901), but if that is fixed, it would be:
(a == b)..(&&c)
With my suggestion, it would be ambiguous between
a == (b..(&&c))
and
a == (b..) && c
I'm starting to think we should keep the precedence as-is, so that ..
must always be enclosed in parenthesis when used in a more complex expression, and only fix the inconsistencies between the prefix form and the other two forms.
On Sat, Jan 10, 2015 at 09:56:42AM -0800, Daniel Grunwald wrote:
@nikomatsakis: I think some kind of parser special case will always be necessary if we want to allow
v[0..a+b]
, avoid grammar inconsistencies between..a
anda..
, and not have a combinatorial explosion in the complexity of the formal grammar at the same time.
Can you spell this out a bit more?
On Sat, Jan 10, 2015 at 01:43:35PM -0800, Daniel Grunwald wrote:
With my suggestion, it would be ambiguous between
a == (b..(&&c))
anda == (b..) && c
Why is this not just an argument for a more-refined precedence (like,
we pick one interpretation and run with it). Only one is remotely
sensible (b..(&&c)
) so that seems to be the one to use. That said,
I've not sat down and looked at the table of precedences to see how
that would fit.
Why is this not just an argument for a more-refined precedence (like, we pick one interpretation and run with it). Only one is remotely sensible (
b..(&&c)
) so that seems to be the one to use.
True, I guess we can just pick one interpretation and use it.
Though I would have picked the other interpretation: if r1 == 1.. && r2 == ..2 {}
should probably be if r1 == (1..) && r2 == (..2) {}
The other choice would disambiguate it as if r1 == (1.. (&& r2)) == ..2 {}
.
I'm starting to think we should keep the precedence as-is, so that .. must always be enclosed in parenthesis
When I wrote that, I was just concerned about the complexity of having an optional expression in a chain of binops, but I think probably isn't as bad as I initially thought.
The only binops that potentially conflict with the initial token in an expression are: &&
, ||
, *
, &
and ..
itself. (UFCS will add <
and <<
)
Only those with a precedence lower than ..
are problematic, so that's &&
and ||
only.
I don't see much use for ranges of double references or ranges of lambdas, so I'd add the following rule to disambiguate the formal grammar I've given:
If ..
is followed by a token that can be interpreted both as start of expression or binary operator, it will be parsed as:
..
..
This corresponds to the disambiguation of if r1 == 1.. && r2 == ..2 {}
as if r1 == (1..) && r2 == (..2) {}
, but 1..*i
will be 1..(*i)
, not (1..)*(i)
.
for i in 0 .. n {}
doesn't even parse correctly right now, so at very least that should be fixed.I have an implementation of my suggested precedence + disambigation. I will create a PR once I have done some testing.
I was able to eliminate RESTRICTION_NO_DOTS
at the cost of adding a minprec
parameter to parse_binops
and parse_prefix_expr
.
I think we may need a special case to handle for i in 1.. { }
. I'd expect that to be an infinite loop, but currently it gets parsed as for i in (1..{})
.
On Sun, Jan 11, 2015 at 08:21:03AM -0800, Daniel Grunwald wrote:
I think we may need a special case to handle
for i in 1.. { }
. I'd expect that to be an infinite loop, but currently it gets parsed asfor i in (1..{})
.
We do generally have a subset of expression types that appear in
positions terminated by {
(e.g., if
, for
). This seems relevant
here too.
Yes, I already handled that case in my implementation: https://github.com/rust-lang/rust/pull/20958/files#diff-da9d34ca1d0f6beee2838cf02e07345cR2942
Makes RESTRICTION_NO_STRUCT_LITERAL
a slight misnomer, it should probably be RESTRICTION_NO_OPEN_BRACE_WHERE_EXPRESSION_MIGHT_END
except that's too long.
Overview over what my pull request does:
..
precedence changed to between comparison operators and bit or; using the formal grammar:BitOrExpr = BitXorExpr | BitOrExpr "^" BitXorExpr
RangeExpr = BitOrExpr | ".." BitOrExpr | BitOrExpr ".." | BitOrExpr ".." BitOrExpr
CompExpr = RangeExpr | RangeExpr ("==" | "!=" | "<" | ">" | "<=" | ">=") RangeExpr
AndExpr = CompExpr | AndExpr "&&" CompExpr
r == 1.. && condition
), &&
and ||
are parsed as binary operators, other tokens are parsed as prefix operators.for i in 1.. {}
is parsed as for i in (1..) {}
First off, the operator precedence of
a..b
is extremely low.struct Range
recently got #[derive(Eq)], so people might want to compare a range to an expression in range notation.r == 1..10
parses as(r == 1)..10
. Similarly,1..10 == r
parses as1..(10 == r)
. Both result in type errors. I think there's no good reason for..
to have a precedence lower than comparisons. I'd suggest to put it in-between the comparison operators and the bitwise operators.But the current implementation gets weirder when we consider the other forms of range notation:
r == 1..
parses as(r == 1)..
, and is a type error (or maybe aRangeFrom<bool>
). Butr == ..1
parses asr == (..1)
, and is a valid range comparison.In the other direction,
..1 == r
parses as..(1 == r)
and is potentially aRangeTo<bool>
. But1.. == r
is a syntax error (expected one of)
or,
, found==
). This inconsistent with the prefix form.If we keep the current precedence of
..
(lower than all binops), I think we should disallowr == ..1
. I think this can be implemented easily by using RESTRICTION_NO_DOTS when parsing the RHS of any binop.Also, we should use RESTRICTION_NODOTS when parsing the RHS of prefix-version
..expr
-- currently..1..2
is a `RangeTo<Range<>>, while
1..2..3and
1..2..` both are syntax errors.If we increase the precedence, we will need to be careful not to allow code like
1 + ..2
. I think any binop with precedence smaller than that of..
will have to use RESTRICTION_NO_DOTS on the RHS.I also think the implementation can be simplified quite a bit if the prefix
..
wasn't handled inparse_prefix_expr
(as that's only supposed to be used for operators with high precedence), but at the same level as the binary and postfix version (parse_assign_expr
, orparse_binops
if we change the precedence). This would allow us to get rid of RESTRICTION_NO_DOTS and use the normal operator precedence handling instead.