rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
99.06k stars 12.79k forks source link

Range expressions: discrepancies between rustc and parser-lalr #28785

Open rprichard opened 9 years ago

rprichard commented 9 years ago

I'd guess that the last test case here is a rustc bug, but I don't know about the others.

fn match_rangefull() {
    match .. { _ => () } // rustc compiles it, parser-lalr rejects
}
fn field_of_range() {
    123.. .start; // rustc rejects, parser-lalr accepts
}
fn eq_rangefull() {
    .. == ..; // rustc rejects, parser-lalr accepts
}
fn return_rangeto() -> std::ops::RangeTo<i32> {
    return ..5; // rustc compiles it, parser-lalr parses it as (return)..5
}
fn assign_range() {
    let x;
    x = 4..5; // rustc compiles it, parser-lalr parses it as (x = 4)..5
}
fn nonblock_expr_range() {
    // rustc compiles this, and I don't think it should.  I think parser-lalr
    // gets it right.
    //
    //    rustc parses it like:       let _one = (1..).start;
    //    parser-lalr parses it like: let _one = {1; ..}.start;
    //
    let _one = { {1} .. }.start;
}
fn main() {}

Also see https://github.com/rust-lang/rust/issues/25119.

rprichard commented 9 years ago

Looking through https://github.com/rust-lang/rust/issues/20811, I see two more cases where rustc and parser-lalr disagree:

fn rangefrom_compare() {
    1.. == (1..); // rustc syntax error, parser-lalr parses as (1..) == (1..)
}
fn compare_rangeto() {
    (..1) == ..1; // rustc syntax error, parser-lalr parses as (..1) == (..1)
}
nikomatsakis commented 9 years ago

triage: P-medium

This is a backwards incompat issue, but feels like an edge case. Nonetheless we should definitely fix it!

dgrunwald commented 9 years ago

Can you explain why you think that { {1} .. }.start should parse as {1; ..}.start? I'm not familiar with the details of Rust's semicolon insertion.

I would expect {} within an expression to work pretty much wherever () does (except in if conditions / match expressions or similar, where {} starts the control flow block), and ((1)..).start is definitely valid syntax.

On all other examples, I'd say the rustc behavior is correct. The decision in #21374 was: .. is non-associative has the same precedence as the assignment operator. The unary forms of .. are only valid at the same precedence level as the binary form.

Comparing ranges needs parentheses: the attempt to give .. a higher precedence than the comparison operators in #20811 was not adopted because it caused new ambiguities between unary and binary &&, e.g. in r == 1.. && condition.

Admittedly, having .. at the same precedence level as assignment is somewhat weird, as the operators on that level have different associativity (assignment is right-associative; .. is non-associative). It would be better to put .. on its own level directly above the assignment operator and placement-new-arrow.

rprichard commented 9 years ago

Regarding { {1} .. }.start, my reasoning was based on the idea that a "statement-like" expression in statement position cannot start a binary expression. e.g. let x = { {} - 1 }; parses and type-checks because it's the same thing as:

let x = {
  {}
  -1
}

(Rust ignores newlines outside of string-literals, AFAIK.)

let x = { () - 1 }; doesn't type-check.

@nikomatsakis made some comments to this effect here:

Of course, 1.. is a unary expression, not a binary expression.

FWIW, I noticed that the current nightly Rust compiler is now parsing { {1} .. }.start differently -- the .. is parsed as a nullary operator (i.e. as-if there were a semicolon after the {1}).

dgrunwald commented 9 years ago

That makes sense; I didn't realize we had that rule about {} not starting binary expressions.

It also fits what I wrote in #25119: .. is only accepted as expression in locations where a..b would also be accepted. { {1} a..b } parses as if there's a semicolon after the {1}, so the same should apply to { { 1 } .. }.

In a formal grammar, the restrictions around .. are easily expressed by using a production for every precedence level:

asgn_expr
: range_expr
| range_expr "=" asgn_expr
;

range_expr
: logic_or_expr
| logic_or_expr ".." logic_or_expr
| logic_or_expr ".."
| ".." logic_or_expr
| ".."

logic_or_expr
: logic_and_expr
| logic_or_expr "||" logic_and_expr
;

Is there any way to do this with the %precedence mechanism? A production for each precedence level would bloat the grammar quite a bit, given that they get multiplied with the restrictions (normal, nonblock, nonparen, nostruct).

rprichard commented 9 years ago

I think we can reduce the problem to:

expr
: DOTDOT
| expr DOT ID
| ID
{}
;

We need to reject DOTDOT DOT ID. My full test case is here. Menhir and yacc both think my precedence declarations are useless. I suspect the precedence mechanism doesn't do what we want.

(Aside: I'm finding menhir --interpret --interpret-show-cst useful for playing with grammars. You can type DOTDOT DOT ID and see what CST is parsed, or whether it's rejected.)

Some parser generators (e.g. menhir) allowing parameterizing rules. e.g. Maybe a formal grammar could have a single range_expr rule that is parameterized over each of the restrictions.

(Also: I get the impression that the nonparen restriction class might be going away. It's needed only for the placement new box (xxx) yyy syntax, which has changed to box in xxx { yyy }.)

dgrunwald commented 9 years ago

Actually, I'm starting to think that the rustc behavior might not be ideal: it interacts badly with lambda syntax.

Rust currently has 7 non-associative operators: .. and the 6 comparison operators. But within lambdas, these behave differently!

Rustc currently rejects |a, b| a == b == c as a syntax error (cannot chain comparison operators). But it parses |a, b| a .. b .. c successfully, as (|a, b| (a .. b)) .. c.

Lambdas have the same "precedence inversion" problem that I tried to avoid for (unary) ..: they can appear within operators with high precedence, but contain an arbitrary expression with lower precedence.

For example, a * || b + c is a syntactically valid expression, parsing as a * (|| (b + c)). A programmer who knows about the precedence of * and + but not about the effect of lambdas would probably expect (a * (|| b)) + c instead.

This gets especially problematic when the programmer assumes the lambda body is an expression, but the expression actually ends earlier. Is |a, b| a .. b .. c the only case of a valid expression that "breaks out" of the lambda?

One solution would be to parse .. as parser-larl does, and then later reject syntax trees where .. appears as operand of another operator with higher precedence. (similar to how chained comparisons are being rejected)

The other option would be to disallow lambdas from appearing as left-hand-operand of a binary operator (just like we already disallow blocks in such a position).

dgrunwald commented 9 years ago

|a, b| a .. b .. c isn't the only case, I found another one. The following is a valid rust program, but probably shouldn't be:

fn main() {
    || println!("Hello World") as ()()
}

I think we should disallow lambdas from appearing as left-hand-operand of a binary operator or as operand of a suffix expression.

nikomatsakis commented 9 years ago

On Thu, Nov 05, 2015 at 09:59:14AM -0800, Ryan Prichard wrote:

Some parser generators (e.g. menhir) allowing parameterizing rules. e.g. Maybe a formal grammar could have a single range_expr rule that is parameterized over each of the restrictions.

Just wanted to point out that LALRPOP supports this sort of parameterization (and generates Rust). At some point I want to go ahead and port one of these LALR Rust grammars to LALRPOP.

brson commented 8 years ago

@nikomatsakis can mentor. Needs investigation into the correct grammar.

Mark-Simulacrum commented 7 years ago

@nikomatsakis Could you either provide some instructions here, or otherwise unmark as E-mentor?

steveklabnik commented 6 years ago

Triage; is parser-lalr still a thing? is this issue relevant?

also, removing e-mentor

dgrunwald commented 2 years ago

Apparently || println!("Hello World") as ()() was fixed at some point by the introduction of a "casts cannot be followed by a function call" error, but the issues with the interaction of .. and lambdas still exist.

dtolnay commented 2 years ago