C#-style named arguments emit type ascription errors

workingjubilee commented 12 months ago

Code

#[pyfunction]
fn return_object(py: Python<'_>, x:PyObject) -> i64 {
  let a = x.getattr(py, attr_name: size);
  return a;
}

Current output

error: expected identifier, found `:`
  --> src/lib.rs:44:34
   |
44 |   let a = x.getattr(py, attr_name: size);
   |                                  ^ expected identifier
   |
   = note: type ascription syntax has been removed, see issue #101728 <https://github.com/rust-lang/rust/issues/101728>

error: could not compile `tryout-pyo3` (lib) due to previous error

Desired output

error: expected identifier, found `:`
  --> src/lib.rs:44:34
   |
44 |   let a = x.getattr(py, attr_name: size);
   |                         ^^^^^^^^^^^^^^^ you cannot use named arguments
   |
help: try passing a value instead?
   |
44 |   let a = x.getattr(py, size);

error: could not compile `tryout-pyo3` (lib) due to previous error

Rationale and extra context

Many close variations emit really bad diagnostics as well, often referencing type ascription, and all of these variations being really bad causes incredible amounts of thrashing while trying to learn Rust.

Other cases

No response

Anything else?

Note that this suggestion also is wrong, it should be "size", because &'static str (and probably a few other relevant types) implements IntoPy for PyString.

workingjubilee commented 12 months ago

see also:

estebank commented 12 months ago

This case will be easier than the one in #115192, because : is not a valid thing that can live in expressions (anymore). All that we need to do is change parse_expr_paren_seq to do a check for whether the current token is an ident and a lookahead of 1 to see if the next token is a colon (within the closure for comma separated things). If you want to be fancy, you can also set a snapshot and verify that parsing an expression after the : actually works (because someone might be trying to use type ascription, which means them writing a type there, instead of an expression). The = is trickier because it can't be handled in the parser.

reez12g commented 12 months ago

@rustbot claim

nouritsu commented 10 months ago

@rustbot claim

nouritsu commented 10 months ago

Anyone up to mentor? Not my first GitHub contribution, but it is my first contribution to the Rust project.

workingjubilee commented 10 months ago

Please do not use the experts map, it is outdated and one of the people you pinged is on vacation. One of the people on the map has not been a compiler contributor for four years.

estebank commented 10 months ago

@nouritsu I am available for help. For the : case we have to modify the parser, specifically parse_expr_paren_seq which gets called from parse_expr_fn_call. Right now that calls parse_expr_catch_underscore for each argument after finding a , or the closing ).

Looking at it we probably only need to modify the error branch here to also look for "is the thing parsed so far a bare ident, do we have a : token and is the next stuff suitable to start a type? If so, special case and recover".

https://github.com/rust-lang/rust/blob/4e418805da5867bc48d82ba1cc7eff2ba68be575/compiler/rustc_parse/src/parser/expr.rs#L127-L142

nouritsu commented 10 months ago

I understand, I'll start working on this after I get back from school in ≈16 hours

nouritsu commented 10 months ago

So far I think the match arm should look like -

Some(i) if self.may_recover() && self.look_ahead(1, |t| t == &token::Colon) && self.look_ahead(2, |t| t == /*?Is a Type?*/)

I am unsure how to check if the 2nd look ahead is suitable to start a type.

Edit: could &token::NtTy(?) be used, I'm not sure what it wraps around

estebank commented 10 months ago

Looking at parse_ty_common the logic to check whether a type start is valid is quite complex. I thought we already had a can_begin_type, but we only have can_begin_expr. We have a handful of options:

check for : and after that create a snapshot and try to parse the type (this is quite a bit of code, but foolproof)
write a new method for can_begin_type, but such a method for a single place is a lot of work
ignore the problem for now and just emit an error saying "hey, if the following is a type, then..."
flat out try to parse a type, if it is an error, emit it and return the earlier one

A PR with the "ignore" approach would be acceptable, I'd prefer the first approach for functionality, eventually.

nouritsu commented 10 months ago

I'd love to work on the first approach, sounds more elegant than just ignoring the error. I would require some time to understand snapshot, maybe a week or two. If solving this issue is a matter of urgency, I would not mind if I was unassigned for someone who can do it faster.

nouritsu commented 10 months ago

Also question: why not go with the second approach of can_begin_type which can be used in parse_ty_common?

estebank commented 10 months ago

Also question: why not go with the second approach of can_begin_type which can be used in parse_ty_common?

Because right now can_begin_type checks each alternative in order and errors out on fallthrough. Having the additional function would be redundant (which means that behavior can diverge by accident).

snapshot is pretty much "take the self Parser and clone() it". You can look for calls to it and the subsequent logic, where we usually do something like

let snapshot = self.snapshot();
let recovered_foo = match self.parse_foo() {
    Ok(foo) => {
        err.note("we know that it was followed by a foo");
        err.emit();
        foo
    }
    Err(mut foo_err) => {
        // The parse failed, so we clean up to before the parse.
        foo_err.cancel();
        self.restore_snapshot(snapshot);
        return Err(err);
    }
};

The tricky thing is when you have a lot of state in flight (like parsing multiple things in a row) you have to be very careful to actually handle every possible code path and clean up appropriately.

nouritsu commented 10 months ago

right, I understand snapshots and the control flow based on above conversation -

fn parse_expr_catch_underscore(&mut self, restrictions: Restrictions) -> PResult<'a, P<Expr>> {
        match self.parse_expr_res(restrictions, None) {
            Ok(expr) => Ok(expr),
            Err(mut err) => match self.token.ident() {
                Some(i) if self.may_recover() && self.look_ahead(1, |t| t == &token::Colon) => {
                    let snapshot = self.create_snapshot_for_diagnostic();

                    match self.parse_ty() {
                        Ok(_) => { /* user is trying to use type scription syntax */ }
                        Err(_) => { /* user is trying to c# style use named arguments */ }
                    }

                    self.restore_snapshot(snapshot);
                }
                Some((Ident { name: kw::Underscore, .. }, false))
                    if self.may_recover() && self.look_ahead(1, |t| t == &token::Comma) =>
                {
                    // Special-case handling of `foo(_, _, _)`
                    err.emit();
                    self.bump();
                    Ok(self.mk_expr(self.prev_token.span, ExprKind::Err))
                }
                _ => Err(err),
            },
        }
    }

I'll move the restore to just before an error is returned. How do I return a type ascription error or a named argument error? Usually in other rust projects there is an enum that contains error variants, like -

enum E {
    LengthError,
    TypeError,
    FormatError,
}

Does the parser have such an enum? If not, what do I use?

I tried to dig through PResult then the DiagnosticBuilder but I don't think that's the correct direction.

estebank commented 10 months ago

PResult<'a, T> is Result<T, DiagnosticBuilder<'a>>. The DiagnosticBuilder is a struct that contains all the info that gets displayed to the user and is indeed what you want to return. Every function that can fail in the parser will return such a displayable error. Anywhere where the parser itself needs to understand what failed uses a Result<T, (some_metadata, DiagnosticBuilder<'a>)> or the like.

nouritsu commented 10 months ago

I see, for now I'd like to switch my approach to the "ignore the problem and emit error" option. This would help me understand the structure slightly better. I don't understand how doing adding a branch that checks for ident : helps in changing the error message to what OP wanted.

Note that after I get the ignore approach working, I'll be switching to the snapshot approach and make a PR

estebank commented 10 months ago

If you can parse an expression after the :, it means you can provide a span_suggestion_verbose(ident.span.until(expr.span), "remove the name for the parameter", "", Applicability::MaybeIncorrect);.

nouritsu commented 10 months ago

I implemented that, while tested my error message never appears -

fn main() {
    let a = 50;

    foo(x: i32);
}

fn foo(x: i32) {}

Is the type ascription error only shown with methods?

nouritsu commented 10 months ago

It appears the error only occurs with methods -

fn main() {
    let a = 50;

    Foo::new().foo(x: a);

}

struct Foo;

impl Foo {
    pub fn new() -> Self {
        Self
    }

    pub fn foo(x: i32){}
}

The changes I made in parse_expr_catch_underscore did not make a difference.

estebank commented 10 months ago

Try to unify the handling of arguments: method and function calls should have the same errors for this (except the struct literal suggestion).

nouritsu commented 10 months ago

I have identified why the code never enters the branch.

fn parse_expr_catch_underscore(&mut self, restrictions: Restrictions) -> PResult<'a, P<Expr>> {
        match self.parse_expr_res(restrictions, None) {
            Ok(expr) => Ok(expr),
            Err(mut err) => match self.token.ident() {
                Some((Ident { name: kw::Underscore, .. }, false))
                    if self.may_recover() && self.look_ahead(1, |t| t == &token::Comma) =>
                {
                    // Special-case handling of `foo(_, _, _)`
                    err.emit();
                    self.bump();
                    Ok(self.mk_expr(self.prev_token.span, ExprKind::Err))
                }
                Some(_) if self.may_recover() && self.look_ahead(0, |t| t == &token::Colon) => {
                    /* my branch */
                    panic!("Entered Branch")
                }
                s => {
                    println!(
                        "{}\t{}\t{:?}\t{:?}\t{:?}",
                        self.may_recover(),
                        self.look_ahead(0, |t| t == &token::Colon),
                        s,
                        self.token,
                        self.prev_token,
                    );
                    Err(err)
                }
            },
        }
    }

Testing with the following code -

fn main() {
    let a = 50;

    foo(x: a)
}

fn foo(x: i32) {}

The logged variables have values as follows -

true    true    None    Token { kind: Colon, span: .\main.rs:4:10: 4:11 (#0) }  Token { kind: Ident("x", false), span: .\main.rs:4:9: 4:10 (#0) }

The parser progresses to the colon without ever being bumped. Should I match against self.prev_token.ident() in the Err branch of self.parse_expr_res?

Or should I match against None if self.prev_token.is_ident() in the match for self.token.ident()

I think the second approach is cleaner, I'll implement it for now but would love your input.

nouritsu commented 10 months ago

I got the suggestion working (kinda). Is there a way to combine two spans?

Solved: span.to()

But now I have this -

nouritsu commented 10 months ago

I'm waiting for a response. Meanwhile, I'm exploring neighboring and parent functions to understand rust semantics better.

estebank commented 10 months ago

The parser progresses to the colon without ever being bumped.

When doing a look_ahead check and finding what you want, you'll want to call self.bump() to advance the parser state.

self.look_ahead(0, |t| t == &token::Colon)

You don't need to do look_ahead(0, that's just self.token.

For what you want to do, in most cases it is fine to leave stray whitespace, but the way to deal with it is with param_name_span.until(ty.span). This creates a new Span that starts at param_name_span.lo() and ends at ty.span.lo(). This will also include the whitespace before the type.

nouritsu commented 10 months ago

I'm done with my exams, I'll start working on this again. Thank you for the response, I'll implement those changes.

nouritsu commented 10 months ago

I think I have come up with a solution -

fn parse_expr_catch_underscore(&mut self, restrictions: Restrictions) -> PResult<'a, P<Expr>> {
        match self.parse_expr_res(restrictions, None) {
            Ok(expr) => Ok(expr),
            Err(mut err) => match self.token.ident() {
                Some((Ident { name: kw::Underscore, .. }, false))
                    if self.may_recover() && self.look_ahead(1, |t| t == &token::Comma) =>
                {
                    // Special-case handling of `foo(_, _, _)`
                    err.emit();
                    self.bump();
                    Ok(self.mk_expr(self.prev_token.span, ExprKind::Err))
                }
                None if self.may_recover()
                    && self.prev_token.is_ident()
                    && self.token.kind == token::Colon =>
                {
                    err.span_suggestion_verbose(
                        self.prev_token.span.until(self.look_ahead(1, |t| t.span)),
                        "if this is a parameter, remove the name for the parameter",
                        "",
                        Applicability::MaybeIncorrect,
                    );
                    self.bump();
                    err.emit();
                    Ok(self.mk_expr(self.token.span, ExprKind::Err))
                }
                _ => Err(err),
            },
        }
    }

Test code - 1.

fn main() {
    let a = 50;

    foo(x: a);
}

fn foo(x: i32) {}

Output -


fn main() {
let a = 50;

Foo::bar(x: a);
}

struct Foo { pub fn bar(x : i32){} }


Output -
![image](https://github.com/rust-lang/rust/assets/113834791/858edd61-c8f8-458f-a8c7-60fae31c3283)

Note -
Another question: After my error message is emitted, I'm returning a `Ok(self.mk_expr(self.token.span, ExprKind::Err))`, which is what is being emitted after my error. Is there something better I can return that would result in a more concise, targeted error message?

estebank commented 10 months ago

@nouritsu you should create a PR with your code, it'll be easier to discuss specific changes. You might notice that you have a second parse error because the a isn't consumed (after the self.bump() in the None case), and ideally we'd get rid of that as well. Besides that, I think the approach is sound.

nouritsu commented 10 months ago

Created a PR.

nouritsu commented 9 months ago

Can we remove the E - easy tag from this? #118733 has the reasons why

workingjubilee commented 9 months ago

"easy" is unfortunately always "relative to other issues" but yes, this has proven much more annoying than I expected it to be for anyone. Quite sorry, @nouritsu.

nouritsu commented 9 months ago

No need for the apology, this has been an excellent learning experience. I look forward to contributing to rust in the future.

rust-lang / rust