rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.59k stars 12.74k forks source link

range notation (a..b) has weird operator precedence #20811

Closed dgrunwald closed 9 years ago

dgrunwald commented 9 years ago

First off, the operator precedence of a..b is extremely low. struct Range recently got #[derive(Eq)], so people might want to compare a range to an expression in range notation.

r == 1..10 parses as (r == 1)..10. Similarly, 1..10 == r parses as 1..(10 == r). Both result in type errors. I think there's no good reason for .. to have a precedence lower than comparisons. I'd suggest to put it in-between the comparison operators and the bitwise operators.

But the current implementation gets weirder when we consider the other forms of range notation: r == 1.. parses as (r == 1).., and is a type error (or maybe a RangeFrom<bool>). But r == ..1 parses as r == (..1), and is a valid range comparison.

In the other direction, ..1 == r parses as ..(1 == r) and is potentially a RangeTo<bool>. But 1.. == r is a syntax error (expected one of ) or ,, found ==). This inconsistent with the prefix form.

If we keep the current precedence of .. (lower than all binops), I think we should disallow r == ..1. I think this can be implemented easily by using RESTRICTION_NO_DOTS when parsing the RHS of any binop.

Also, we should use RESTRICTION_NODOTS when parsing the RHS of prefix-version ..expr -- currently ..1..2 is a `RangeTo<Range<>>, while1..2..3and1..2..` both are syntax errors.

If we increase the precedence, we will need to be careful not to allow code like 1 + ..2. I think any binop with precedence smaller than that of .. will have to use RESTRICTION_NO_DOTS on the RHS.

I also think the implementation can be simplified quite a bit if the prefix .. wasn't handled in parse_prefix_expr (as that's only supposed to be used for operators with high precedence), but at the same level as the binary and postfix version (parse_assign_expr, or parse_binops if we change the precedence). This would allow us to get rid of RESTRICTION_NO_DOTS and use the normal operator precedence handling instead.

dgrunwald commented 9 years ago

I am planning to fix this myself, but I need input on whether to increase the operator precedence so that r == 1..2 is valid.

bstrie commented 9 years ago

cc @aturon

I think this is a candidate for a nomination as well.

aturon commented 9 years ago

cc @nick29581

aturon commented 9 years ago

Nominating.

aturon commented 9 years ago

cc @nikomatsakis @alexcrichton

alexcrichton commented 9 years ago

May be related to https://github.com/rust-lang/rust/issues/20830

dgrunwald commented 9 years ago

@alexcrichton: That looks like a different bug in the same line of code.

nikomatsakis commented 9 years ago

I agree about the current precedence being wrong, and think that the suggested position is probably good, but I am very wary of the "restriction no dots" rules. Every restriction is a parser bug until proven otherwise, from my POV, and I am not convinced that we need RESTRICTION_NO_DOTS at all anymore.

dgrunwald commented 9 years ago

@nikomatsakis: I think some kind of parser special case will always be necessary if we want to allow v[0..a+b], avoid grammar inconsistencies between ..a and a.., and not have a combinatorial explosion in the complexity of the formal grammar at the same time.

I know this problem from when I tried writing a C# parser using a parser generator. It turns out that the Microsoft C# compiler is a hand-written recursive-descent parser using the shunting yard algorithm to parse binops. But not all C# binops are true binary operators taking two expressions: the as operator takes a type on the RHS, yet has lower precedence than some other binary operators. This results in some strange effects -- a + b as c + d is valid in C# and parses as ((a + b) as c) + d, even though the precedence of as is lower than that of +. As far as I know, there is no way to write a formal grammar that accepts the same expressions as the MS C# compiler without duplicating all expression-related productions.

I'd prefer if rust does not make the same mistake as C#, so any prefix- and suffix- operators must be forbidden from being used in a context which conflicts with their operator precedence, even if operator precedence is unnecessary to disambiguate the expression. As far as I know, .. is currently the only prefix-/suffix- operator where this problem occurs, since as and all unary operators have a higher precedence than any binary operator.

Here's how I imagine the formal grammar:

BitOrExpr = BitXorExpr | BitOrExpr "^" BitXorExpr
RangeExpr = BitOrExpr | ".." BitOrExpr | BitOrExpr ".." | BitOrExpr ".." BitOrExpr
CompExpr = RangeExpr | RangeExpr ("==" | "!=" | "<" | ">" | "<=" | ">=") RangeExpr
AndExpr = CompExpr | AndExpr "&&" CompExpr

This means that "^" and "&&" are left-associative; comparison operators don't have any associativity (since RFC 558); and the range expression doesn't have any associativity either.

I hope that this can be implemented without using a restrictions bitflag in the parser. But if not, I'd prefer special cases in the parser over special cases in the grammar.

dgrunwald commented 9 years ago

Changing the precedence level as I suggested actually creates new syntactic ambiguities: a == b.. && c currently is... actually a syntax error because can_begin_expr doesn't handle && (#20901), but if that is fixed, it would be: (a == b)..(&&c)

With my suggestion, it would be ambiguous between a == (b..(&&c)) and a == (b..) && c

I'm starting to think we should keep the precedence as-is, so that .. must always be enclosed in parenthesis when used in a more complex expression, and only fix the inconsistencies between the prefix form and the other two forms.

nikomatsakis commented 9 years ago

On Sat, Jan 10, 2015 at 09:56:42AM -0800, Daniel Grunwald wrote:

@nikomatsakis: I think some kind of parser special case will always be necessary if we want to allow v[0..a+b], avoid grammar inconsistencies between ..a and a.., and not have a combinatorial explosion in the complexity of the formal grammar at the same time.

Can you spell this out a bit more?

nikomatsakis commented 9 years ago

On Sat, Jan 10, 2015 at 01:43:35PM -0800, Daniel Grunwald wrote:

With my suggestion, it would be ambiguous between a == (b..(&&c)) and a == (b..) && c

Why is this not just an argument for a more-refined precedence (like, we pick one interpretation and run with it). Only one is remotely sensible (b..(&&c)) so that seems to be the one to use. That said, I've not sat down and looked at the table of precedences to see how that would fit.

dgrunwald commented 9 years ago

Why is this not just an argument for a more-refined precedence (like, we pick one interpretation and run with it). Only one is remotely sensible (b..(&&c)) so that seems to be the one to use.

True, I guess we can just pick one interpretation and use it. Though I would have picked the other interpretation: if r1 == 1.. && r2 == ..2 {} should probably be if r1 == (1..) && r2 == (..2) {} The other choice would disambiguate it as if r1 == (1.. (&& r2)) == ..2 {}.

I'm starting to think we should keep the precedence as-is, so that .. must always be enclosed in parenthesis

When I wrote that, I was just concerned about the complexity of having an optional expression in a chain of binops, but I think probably isn't as bad as I initially thought.

The only binops that potentially conflict with the initial token in an expression are: &&, ||, *, & and .. itself. (UFCS will add < and <<) Only those with a precedence lower than .. are problematic, so that's && and || only. I don't see much use for ranges of double references or ranges of lambdas, so I'd add the following rule to disambiguate the formal grammar I've given: If .. is followed by a token that can be interpreted both as start of expression or binary operator, it will be parsed as:

This corresponds to the disambiguation of if r1 == 1.. && r2 == ..2 {} as if r1 == (1..) && r2 == (..2) {}, but 1..*i will be 1..(*i), not (1..)*(i).

Gankra commented 9 years ago

20921 points out that for i in 0 .. n {} doesn't even parse correctly right now, so at very least that should be fixed.

dgrunwald commented 9 years ago

I have an implementation of my suggested precedence + disambigation. I will create a PR once I have done some testing.

I was able to eliminate RESTRICTION_NO_DOTS at the cost of adding a minprec parameter to parse_binops and parse_prefix_expr.

dgrunwald commented 9 years ago

I think we may need a special case to handle for i in 1.. { }. I'd expect that to be an infinite loop, but currently it gets parsed as for i in (1..{}).

nikomatsakis commented 9 years ago

On Sun, Jan 11, 2015 at 08:21:03AM -0800, Daniel Grunwald wrote:

I think we may need a special case to handle for i in 1.. { }. I'd expect that to be an infinite loop, but currently it gets parsed as for i in (1..{}).

We do generally have a subset of expression types that appear in positions terminated by { (e.g., if, for). This seems relevant here too.

dgrunwald commented 9 years ago

Yes, I already handled that case in my implementation: https://github.com/rust-lang/rust/pull/20958/files#diff-da9d34ca1d0f6beee2838cf02e07345cR2942 Makes RESTRICTION_NO_STRUCT_LITERAL a slight misnomer, it should probably be RESTRICTION_NO_OPEN_BRACE_WHERE_EXPRESSION_MIGHT_END except that's too long.

Overview over what my pull request does:

BitOrExpr = BitXorExpr | BitOrExpr "^" BitXorExpr
RangeExpr = BitOrExpr | ".." BitOrExpr | BitOrExpr ".." | BitOrExpr ".." BitOrExpr
CompExpr = RangeExpr | RangeExpr ("==" | "!=" | "<" | ">" | "<=" | ">=") RangeExpr
AndExpr = CompExpr | AndExpr "&&" CompExpr