Suggestion: provide better report of function argument mismatch

Quantumplation commented 5 years ago

While discussing #64915 with @estebank, I asked if it would make sense to also provide a more detailed report about how closures mismatched. For example, introspecting on the types to detect when types are in the wrong order, or when arguments are mismatched.

He pointed out that we don't even do that for regular function calls right now. For example:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8288d94517c964071278be6833560f70

The above snippet could, in theory, provide a more detailed report which suggested that the arguments might be swapped in the first call, or that the second argument is missing in the second call. (Things like longest common subsequence could help here).

@estebank said he's been thinking about something like this for a while, and offered to provide some mentorship if I wanted to tackle this as a slightly bigger contribution, since it'd be easier than closure comparison off the bat.

This issue has been assigned to @Quantumplation via this comment.

estebank commented 5 years ago

I believe that the only place you'll have to look at to implement this is

https://github.com/rust-lang/rust/blob/46e6c533d08a2c6d22083a2756a0b569e001c3c4/src/librustc_typeck/check/mod.rs#L3311-L3546

particularly, https://github.com/rust-lang/rust/blob/46e6c533d08a2c6d22083a2756a0b569e001c3c4/src/librustc_typeck/check/mod.rs#L3413-L3421

Quantumplation commented 5 years ago

@rustbot claim

I'll give this a shot! Might be too ambitious for me though. :)

workingjubilee commented 5 years ago

I have a related remark that I had built up while reviewing related diagnostic commits/issues that have been floating around:

E0631 uses a concept of "found/expected", as does E0271, "found/expected", but these errors have been reported as confusing and backwards. So it seems wise to head off all confusion in the future, and it seems to me that the implied "directionality" could be read as flowing one way or another depending on perspective, and might in fact change if a compiler optimization happens to adjust the orders of things.

E0281, which is the error code that E0631 retired, used an expression more along the lines of "Z has Y, but that requires A according to B", which seems logically more time agnostic and resilient to refactoring (especially because it describes more closely the "this atom is being asked to be both Hydrogen and Helium" problem). The only problem as-such is that diagnostics probably are best if they are kept relatively terse and simple, but explicitness is still preferable.

Quantumplation commented 5 years ago

@workingjubilee I had noticed the same. Especially confusing is that in the code for #64915, it seems variable names got swapped around (in multiple places!), so it got really confusing to trace through what happened...

Quantumplation commented 5 years ago

Some notes for myself, while I begin to understand the code and think through the problem. Please feel free to chime in if you notice something I'm missing or misinterpreting!

Closures

I think

https://github.com/rust-lang/rust/blob/46e6c533d08a2c6d22083a2756a0b569e001c3c4/src/librustc_typeck/check/mod.rs#L3379-L3383

is where we handle closure cases (this tuple_arguments call seems to be set to TupleArguments when for closures and overloaded functions), so could probably benefit from the same enhancement.

Variadic Functions

handles external variadic c functions, and if we have a description of the types, we could suggest what's missing.

formal_tys

https://github.com/rust-lang/rust/blob/46e6c533d08a2c6d22083a2756a0b569e001c3c4/src/librustc_typeck/check/mod.rs#L3421

Here, we report the mismatch arg count error, and then here:

https://github.com/rust-lang/rust/blob/46e6c533d08a2c6d22083a2756a0b569e001c3c4/src/librustc_typeck/check/mod.rs#L3423-L3424

we assign a bunch of err_args to the formal_tys, which we use below:

https://github.com/rust-lang/rust/blob/46e6c533d08a2c6d22083a2756a0b569e001c3c4/src/librustc_typeck/check/mod.rs#L3490-L3509

to actually check the types.

I'm not sure whether I should

Just modify the mismatch arg count error stuff, and in that case do similar type comparison logic with adjacent parameters to probe for missing arguments
Remove the mismatch arg count stuff entirely, and make the type comparison do the neighbor-probing

The first is likely overall faster (we're only doing the extra work when counts don't match, meaning we've definitely skipped or added extra args) but incomplete (if we've skipped and argument and added one extra, or swapped two arguments). Solving this problem in general and trying to figure out exact user intent is probably overkill, but would certainly be cool. Thoughts, @estebank?

Algorithm for detecting skips

These are the simplest cases we could try to detect:

function inputs:   A B C
      arguments:   A B C    // Typecheck!
      arguments:   A D C    // D is incorrect!
      arguments:   A C      // Forgot B!
      arguments:   A B D C  // D is extra!
      arguments:   A C B    // B and C are swapped!

Some more complex cases we might want to handle (assume AB could satisfy A or B):

function inputs:   A  B  C
      arguments:   A  BC B     // B and C are swapped!

(I'll add more to the above tables as I think of them

The really tricky part is trying to detect combinations of these in general, and I think starts getting into the domain of diminishing returns.

What I propose, at least for a first pass to introduce this probing, is to assume that at most one of these mistakes have been made, and if we can't find such a mistake that makes the rest of the types line up, fall back to the error reporting we have now. As always, need someone with more experience/context to help guide my decision making here.

estebank commented 5 years ago

Closures

One thing to note: the AST keeps closures as a function with a single argument which is a tuple. A closure |()| {} would be encoded in the AST as a callable with a single argument which is a tuple of a tuple, while || {} would be encoded as a callable with a single argument which is an empty tuple. That is part of what you're seeing there.

we assign a bunch of err_args to the formal_tys, which we use below:

As you can see we use tcx.types.err, which is a special type similar to ! and () in that it has special meaning but users have no way of creating this in rust code. We use this to signal parts of the AST or the type tree that have had problems. When doing typechecking, for example saying that foo and bar have to have the same type, of either one of them is a TyErr we take a note of it, make the resulting type TyErr and do not emit an error, because whatever created a TyErr has already emitted one. This way, TyErr is "viral" (in that it infects everything it touches) and has a "calming" effect (in that knock down errors caused by an earlier error are not emitted). This later behavior is useful for cases like the following:

let foo = <Something that causes a type error>; // We emit the original error that the user can fix
bar(foo); // We hide a type error that will be fixed if `foo` is fixed.

Marking all the arguments TyErr just hide all possible errors from arguments. This was to address similar concerns to this ticket: if the first argument of a 10 parameter function (don't write these :P) is missing, then you'd get 10 errors (1 for the mismatched argument count and 9 for type errors).

Also, there's an fn similar to demand_coerce (which generates a new type from two types in an expression, the one known so far and the expected one and stores it in the type tables) called can_coerce which will perform the same validation, but is side effect free and is suitable for error recovery when trying to check if different synthetic expressions would have been valid, like removing or adding a borrow.

thoughts?

For the two cases (incorrect order and missing args) I think that a reasonable approach would be:

If there are multiple fn parameters with the same type, keep the current behavior
Do the current type checking in order, and if there's a failure try to skip arguments until one succeeds, including the ones that have previously failed, or you reach the end. (This can devolve to O(n^2) depending on how its written, which should be fine if it only happens in error recovery.)
- If all args have been found to match some param, then emit a single error suggesting the resorting.
- Otherwise, current output
- The same approach can work for the case of missing or extra params.

I think you have a good base to work out in your summary, it's likely that we'll find things in the PR to change to make sure that we don't affect compilation times on the happy path, but it shouldn't be too hard.

workingjubilee commented 5 years ago

Only favoring one suggestion off the table you offer seems like it would make mathematical sense, I think, because 1) all suggestions are merely probable answers (though confidence in an answer might exceed 99.9%, eventually the suggestion will falter) but 2) due to simple rules of probability, it is more likely that one suggestion will be correct, rather than two suggestions both being correct. Even if two errors seems likely, the combinatorial logic weighs against it. 3) So because we are only reporting one, suggestions should be made in the order the compiler (engineer) has the most confidence that the suggestion is applicable at all, which can weigh more on mere evidence.

Quantumplation commented 5 years ago

@estebank

Thanks for the detailed writeup! Hopefully my verbose style won't be too big a drain on you as I start to learn more about the compiler. ;)

re: Closures, that's what I figured. Out of curiosity, why are closures stored this way?

re: err types, also in line with what I figured was happening.

re: algorithm, I wasn't too concerned about a high asymptotic implementation here, for two reasons. First, as you mention, it's error recovery so it doesn't impact the happy path, and two, the number of elements we're working with here is really small. Even in contrived cases, we're talking dozens of parameters to compare, not thousands. I'm glad you're also not too concerned.

If we were only worried about missing arguments, longest common subsequence solves this problem pretty well (same algorithm that powers textual diffs and such). However, I'm not sure how to adapt it to consider swaps as well, so I'll do some sketching and thinking on that over the next few days.

@workingjubilee My comment about assuming 1 error was less about making one suggestion, and more about assuming that there was only one mistake, and falling back if we couldn't find it. So, in these examples:

function inputs:   A B C
      arguments:   B        // A and C are missing!
function inputs:   A B C D
      arguments:   B A D    // A and B are swapped, and C is missing.  Or is A missing, and the third parameter is the wrong type?

Since (by your first point) we're not trying to figure out the "right" answer, only the "useful" answer, probing for just simple mistakes gives us good net value without huge investment in complexity.

Quantumplation commented 5 years ago

Algorithm

Been working through a POC of the algorithm I'll use ( in a language I'm more familiar with, still not fluent enough in Rust to make it my scratch pad ;) ).

https://glot.io/snippets/fhbpy4kp93

Essentially, you're filling in a matrix of whether various inputs can be coerced to the relevant parameters. A row or column of all falses means an input that can't be satisfies or an argument that can't satisfy anything.

It still doesn't look for swap detection, and is still funky around duplicated types, but I wanted to share my work in progress.

workingjubilee commented 4 years ago

So like

fn grid<T>(w: usize, h: usize) -> Vec<Vec<T>> {
    let mut v = vec![];
    for _ in 0..w {
        v.push(Vec::with_capacity(h));
    }
    v
}

// this next fn definitely doesn't work because it doesn't have a T yet
fn is_easygoing(par: T, arg: T) -> bool {
   par == arg
}

Hmmm....

Quantumplation commented 4 years ago

This fell off my radar for a while because the world got complicated, but I'm about to open a PR :)

rust-lang / rust