Open rprichard opened 9 years ago
Looking through https://github.com/rust-lang/rust/issues/20811, I see two more cases where rustc
and parser-lalr
disagree:
fn rangefrom_compare() {
1.. == (1..); // rustc syntax error, parser-lalr parses as (1..) == (1..)
}
fn compare_rangeto() {
(..1) == ..1; // rustc syntax error, parser-lalr parses as (..1) == (..1)
}
triage: P-medium
This is a backwards incompat issue, but feels like an edge case. Nonetheless we should definitely fix it!
Can you explain why you think that { {1} .. }.start
should parse as {1; ..}.start
? I'm not familiar with the details of Rust's semicolon insertion.
I would expect {}
within an expression to work pretty much wherever ()
does (except in if conditions / match expressions or similar, where {}
starts the control flow block), and ((1)..).start
is definitely valid syntax.
On all other examples, I'd say the rustc behavior is correct. The decision in #21374 was: ..
is non-associative has the same precedence as the assignment operator. The unary forms of ..
are only valid at the same precedence level as the binary form.
Comparing ranges needs parentheses: the attempt to give ..
a higher precedence than the comparison operators in #20811 was not adopted because it caused new ambiguities between unary and binary &&
, e.g. in r == 1.. && condition
.
Admittedly, having ..
at the same precedence level as assignment is somewhat weird, as the operators on that level have different associativity (assignment is right-associative; ..
is non-associative). It would be better to put ..
on its own level directly above the assignment operator and placement-new-arrow.
Regarding { {1} .. }.start
, my reasoning was based on the idea that a "statement-like" expression in statement position cannot start a binary expression. e.g. let x = { {} - 1 };
parses and type-checks because it's the same thing as:
let x = {
{}
-1
}
(Rust ignores newlines outside of string-literals, AFAIK.)
let x = { () - 1 };
doesn't type-check.
@nikomatsakis made some comments to this effect here:
Of course, 1..
is a unary expression, not a binary expression.
FWIW, I noticed that the current nightly Rust compiler is now parsing { {1} .. }.start
differently -- the ..
is parsed as a nullary operator (i.e. as-if there were a semicolon after the {1}
).
That makes sense; I didn't realize we had that rule about {}
not starting binary expressions.
It also fits what I wrote in #25119: ..
is only accepted as expression in locations where a..b
would also be accepted.
{ {1} a..b }
parses as if there's a semicolon after the {1}
, so the same should apply to { { 1 } .. }
.
In a formal grammar, the restrictions around ..
are easily expressed by using a production for every precedence level:
asgn_expr
: range_expr
| range_expr "=" asgn_expr
;
range_expr
: logic_or_expr
| logic_or_expr ".." logic_or_expr
| logic_or_expr ".."
| ".." logic_or_expr
| ".."
logic_or_expr
: logic_and_expr
| logic_or_expr "||" logic_and_expr
;
Is there any way to do this with the %precedence
mechanism? A production for each precedence level would bloat the grammar quite a bit, given that they get multiplied with the restrictions (normal, nonblock, nonparen, nostruct).
I think we can reduce the problem to:
expr
: DOTDOT
| expr DOT ID
| ID
{}
;
We need to reject DOTDOT DOT ID
. My full test case is here. Menhir and yacc both think my precedence declarations are useless. I suspect the precedence mechanism doesn't do what we want.
(Aside: I'm finding menhir --interpret --interpret-show-cst
useful for playing with grammars. You can type DOTDOT DOT ID
and see what CST is parsed, or whether it's rejected.)
Some parser generators (e.g. menhir) allowing parameterizing rules. e.g. Maybe a formal grammar could have a single range_expr
rule that is parameterized over each of the restrictions.
(Also: I get the impression that the nonparen restriction class might be going away. It's needed only for the placement new box (xxx) yyy
syntax, which has changed to box in xxx { yyy }
.)
Actually, I'm starting to think that the rustc behavior might not be ideal: it interacts badly with lambda syntax.
Rust currently has 7 non-associative operators: ..
and the 6 comparison operators. But within lambdas, these behave differently!
Rustc currently rejects |a, b| a == b == c
as a syntax error (cannot chain comparison operators).
But it parses |a, b| a .. b .. c
successfully, as (|a, b| (a .. b)) .. c
.
Lambdas have the same "precedence inversion" problem that I tried to avoid for (unary) ..
: they can appear within operators with high precedence, but contain an arbitrary expression with lower precedence.
For example, a * || b + c
is a syntactically valid expression, parsing as a * (|| (b + c))
. A programmer who knows about the precedence of *
and +
but not about the effect of lambdas would probably expect (a * (|| b)) + c
instead.
This gets especially problematic when the programmer assumes the lambda body is an expression, but the expression actually ends earlier. Is |a, b| a .. b .. c
the only case of a valid expression that "breaks out" of the lambda?
One solution would be to parse ..
as parser-larl does, and then later reject syntax trees where ..
appears as operand of another operator with higher precedence. (similar to how chained comparisons are being rejected)
The other option would be to disallow lambdas from appearing as left-hand-operand of a binary operator (just like we already disallow blocks in such a position).
|a, b| a .. b .. c
isn't the only case, I found another one. The following is a valid rust program, but probably shouldn't be:
fn main() {
|| println!("Hello World") as ()()
}
I think we should disallow lambdas from appearing as left-hand-operand of a binary operator or as operand of a suffix expression.
On Thu, Nov 05, 2015 at 09:59:14AM -0800, Ryan Prichard wrote:
Some parser generators (e.g. menhir) allowing parameterizing rules. e.g. Maybe a formal grammar could have a single
range_expr
rule that is parameterized over each of the restrictions.
Just wanted to point out that LALRPOP supports this sort of parameterization (and generates Rust). At some point I want to go ahead and port one of these LALR Rust grammars to LALRPOP.
@nikomatsakis can mentor. Needs investigation into the correct grammar.
@nikomatsakis Could you either provide some instructions here, or otherwise unmark as E-mentor?
Triage; is parser-lalr still a thing? is this issue relevant?
also, removing e-mentor
Apparently || println!("Hello World") as ()()
was fixed at some point by the introduction of a "casts cannot be followed by a function call" error, but the issues with the interaction of ..
and lambdas still exist.
I'd guess that the last test case here is a
rustc
bug, but I don't know about the others.Also see https://github.com/rust-lang/rust/issues/25119.