Open workingjubilee opened 12 months ago
This case will be easier than the one in #115192, because :
is not a valid thing that can live in expressions (anymore). All that we need to do is change parse_expr_paren_seq
to do a check for whether the current token is an ident and a lookahead of 1 to see if the next token is a colon (within the closure for comma separated things). If you want to be fancy, you can also set a snapshot and verify that parsing an expression after the :
actually works (because someone might be trying to use type ascription, which means them writing a type there, instead of an expression). The =
is trickier because it can't be handled in the parser.
@rustbot claim
@rustbot claim
Anyone up to mentor? Not my first GitHub contribution, but it is my first contribution to the Rust project.
Please do not use the experts map, it is outdated and one of the people you pinged is on vacation. One of the people on the map has not been a compiler contributor for four years.
@nouritsu I am available for help. For the :
case we have to modify the parser, specifically parse_expr_paren_seq
which gets called from parse_expr_fn_call
. Right now that calls parse_expr_catch_underscore
for each argument after finding a ,
or the closing )
.
Looking at it we probably only need to modify the error branch here to also look for "is the thing parsed so far a bare ident, do we have a :
token and is the next stuff suitable to start a type? If so, special case and recover".
I understand, I'll start working on this after I get back from school in ≈16 hours
So far I think the match arm should look like -
Some(i) if self.may_recover() && self.look_ahead(1, |t| t == &token::Colon) && self.look_ahead(2, |t| t == /*?Is a Type?*/)
I am unsure how to check if the 2nd look ahead is suitable to start a type.
Edit: could &token::NtTy(?)
be used, I'm not sure what it wraps around
Looking at parse_ty_common
the logic to check whether a type start is valid is quite complex. I thought we already had a can_begin_type
, but we only have can_begin_expr
. We have a handful of options:
:
and after that create a snapshot
and try to parse the type (this is quite a bit of code, but foolproof)can_begin_type
, but such a method for a single place is a lot of workA PR with the "ignore" approach would be acceptable, I'd prefer the first approach for functionality, eventually.
I'd love to work on the first approach, sounds more elegant than just ignoring the error. I would require some time to understand snapshot
, maybe a week or two. If solving this issue is a matter of urgency, I would not mind if I was unassigned for someone who can do it faster.
Also question: why not go with the second approach of can_begin_type
which can be used in parse_ty_common
?
Also question: why not go with the second approach of can_begin_type which can be used in parse_ty_common?
Because right now can_begin_type
checks each alternative in order and errors out on fallthrough. Having the additional function would be redundant (which means that behavior can diverge by accident).
snapshot
is pretty much "take the self
Parser
and clone()
it". You can look for calls to it and the subsequent logic, where we usually do something like
let snapshot = self.snapshot();
let recovered_foo = match self.parse_foo() {
Ok(foo) => {
err.note("we know that it was followed by a foo");
err.emit();
foo
}
Err(mut foo_err) => {
// The parse failed, so we clean up to before the parse.
foo_err.cancel();
self.restore_snapshot(snapshot);
return Err(err);
}
};
The tricky thing is when you have a lot of state in flight (like parsing multiple things in a row) you have to be very careful to actually handle every possible code path and clean up appropriately.
right, I understand snapshots and the control flow based on above conversation -
fn parse_expr_catch_underscore(&mut self, restrictions: Restrictions) -> PResult<'a, P<Expr>> {
match self.parse_expr_res(restrictions, None) {
Ok(expr) => Ok(expr),
Err(mut err) => match self.token.ident() {
Some(i) if self.may_recover() && self.look_ahead(1, |t| t == &token::Colon) => {
let snapshot = self.create_snapshot_for_diagnostic();
match self.parse_ty() {
Ok(_) => { /* user is trying to use type scription syntax */ }
Err(_) => { /* user is trying to c# style use named arguments */ }
}
self.restore_snapshot(snapshot);
}
Some((Ident { name: kw::Underscore, .. }, false))
if self.may_recover() && self.look_ahead(1, |t| t == &token::Comma) =>
{
// Special-case handling of `foo(_, _, _)`
err.emit();
self.bump();
Ok(self.mk_expr(self.prev_token.span, ExprKind::Err))
}
_ => Err(err),
},
}
}
I'll move the restore to just before an error is returned. How do I return a type ascription error or a named argument error? Usually in other rust projects there is an enum that contains error variants, like -
enum E {
LengthError,
TypeError,
FormatError,
}
Does the parser have such an enum? If not, what do I use?
I tried to dig through PResult then the DiagnosticBuilder but I don't think that's the correct direction.
PResult<'a, T>
is Result<T, DiagnosticBuilder<'a>>
. The DiagnosticBuilder
is a struct that contains all the info that gets displayed to the user and is indeed what you want to return. Every function that can fail in the parser will return such a displayable error. Anywhere where the parser itself needs to understand what failed uses a Result<T, (some_metadata, DiagnosticBuilder<'a>)>
or the like.
I see, for now I'd like to switch my approach to the "ignore the problem and emit error" option. This would help me understand the structure slightly better. I don't understand how doing adding a branch that checks for ident :
helps in changing the error message to what OP wanted.
Note that after I get the ignore approach working, I'll be switching to the snapshot approach and make a PR
If you can parse an expression after the :
, it means you can provide a span_suggestion_verbose(ident.span.until(expr.span), "remove the name for the parameter", "", Applicability::MaybeIncorrect);
.
I implemented that, while tested my error message never appears -
fn main() {
let a = 50;
foo(x: i32);
}
fn foo(x: i32) {}
Is the type ascription error only shown with methods?
It appears the error only occurs with methods -
fn main() {
let a = 50;
Foo::new().foo(x: a);
}
struct Foo;
impl Foo {
pub fn new() -> Self {
Self
}
pub fn foo(x: i32){}
}
The changes I made in parse_expr_catch_underscore
did not make a difference.
Try to unify the handling of arguments: method and function calls should have the same errors for this (except the struct literal suggestion).
I have identified why the code never enters the branch.
fn parse_expr_catch_underscore(&mut self, restrictions: Restrictions) -> PResult<'a, P<Expr>> {
match self.parse_expr_res(restrictions, None) {
Ok(expr) => Ok(expr),
Err(mut err) => match self.token.ident() {
Some((Ident { name: kw::Underscore, .. }, false))
if self.may_recover() && self.look_ahead(1, |t| t == &token::Comma) =>
{
// Special-case handling of `foo(_, _, _)`
err.emit();
self.bump();
Ok(self.mk_expr(self.prev_token.span, ExprKind::Err))
}
Some(_) if self.may_recover() && self.look_ahead(0, |t| t == &token::Colon) => {
/* my branch */
panic!("Entered Branch")
}
s => {
println!(
"{}\t{}\t{:?}\t{:?}\t{:?}",
self.may_recover(),
self.look_ahead(0, |t| t == &token::Colon),
s,
self.token,
self.prev_token,
);
Err(err)
}
},
}
}
Testing with the following code -
fn main() {
let a = 50;
foo(x: a)
}
fn foo(x: i32) {}
The logged variables have values as follows -
true true None Token { kind: Colon, span: .\main.rs:4:10: 4:11 (#0) } Token { kind: Ident("x", false), span: .\main.rs:4:9: 4:10 (#0) }
The parser progresses to the colon without ever being bumped. Should I match against self.prev_token.ident()
in the Err branch of self.parse_expr_res
?
Or should I match against None if self.prev_token.is_ident()
in the match for self.token.ident()
I think the second approach is cleaner, I'll implement it for now but would love your input.
I got the suggestion working (kinda). Is there a way to combine two spans?
Solved: span.to()
But now I have this -
I'm waiting for a response. Meanwhile, I'm exploring neighboring and parent functions to understand rust semantics better.
The parser progresses to the colon without ever being bumped.
When doing a look_ahead check and finding what you want, you'll want to call self.bump()
to advance the parser state.
self.look_ahead(0, |t| t == &token::Colon)
You don't need to do look_ahead(0
, that's just self.token
.
For what you want to do, in most cases it is fine to leave stray whitespace, but the way to deal with it is with param_name_span.until(ty.span)
. This creates a new Span
that starts at param_name_span.lo()
and ends at ty.span.lo()
. This will also include the whitespace before the type.
I'm done with my exams, I'll start working on this again. Thank you for the response, I'll implement those changes.
I think I have come up with a solution -
fn parse_expr_catch_underscore(&mut self, restrictions: Restrictions) -> PResult<'a, P<Expr>> {
match self.parse_expr_res(restrictions, None) {
Ok(expr) => Ok(expr),
Err(mut err) => match self.token.ident() {
Some((Ident { name: kw::Underscore, .. }, false))
if self.may_recover() && self.look_ahead(1, |t| t == &token::Comma) =>
{
// Special-case handling of `foo(_, _, _)`
err.emit();
self.bump();
Ok(self.mk_expr(self.prev_token.span, ExprKind::Err))
}
None if self.may_recover()
&& self.prev_token.is_ident()
&& self.token.kind == token::Colon =>
{
err.span_suggestion_verbose(
self.prev_token.span.until(self.look_ahead(1, |t| t.span)),
"if this is a parameter, remove the name for the parameter",
"",
Applicability::MaybeIncorrect,
);
self.bump();
err.emit();
Ok(self.mk_expr(self.token.span, ExprKind::Err))
}
_ => Err(err),
},
}
}
Test code - 1.
fn main() {
let a = 50;
foo(x: a);
}
fn foo(x: i32) {}
Output -
fn main() {
let a = 50;
Foo::bar(x: a);
}
struct Foo { pub fn bar(x : i32){} }
Output -
![image](https://github.com/rust-lang/rust/assets/113834791/858edd61-c8f8-458f-a8c7-60fae31c3283)
Note -
Another question: After my error message is emitted, I'm returning a `Ok(self.mk_expr(self.token.span, ExprKind::Err))`, which is what is being emitted after my error. Is there something better I can return that would result in a more concise, targeted error message?
@nouritsu you should create a PR with your code, it'll be easier to discuss specific changes. You might notice that you have a second parse error because the a
isn't consumed (after the self.bump()
in the None
case), and ideally we'd get rid of that as well. Besides that, I think the approach is sound.
Created a PR.
Can we remove the E - easy
tag from this? #118733 has the reasons why
"easy" is unfortunately always "relative to other issues" but yes, this has proven much more annoying than I expected it to be for anyone. Quite sorry, @nouritsu.
No need for the apology, this has been an excellent learning experience. I look forward to contributing to rust in the future.
Code
Current output
Desired output
Rationale and extra context
Many close variations emit really bad diagnostics as well, often referencing type ascription, and all of these variations being really bad causes incredible amounts of thrashing while trying to learn Rust.
Other cases
No response
Anything else?
Note that this suggestion also is wrong, it should be
"size"
, because&'static str
(and probably a few other relevant types) implements IntoPy for PyString.