Open Quantumplation opened 5 years ago
I believe that the only place you'll have to look at to implement this is
particularly, https://github.com/rust-lang/rust/blob/46e6c533d08a2c6d22083a2756a0b569e001c3c4/src/librustc_typeck/check/mod.rs#L3413-L3421
@rustbot claim
I'll give this a shot! Might be too ambitious for me though. :)
I have a related remark that I had built up while reviewing related diagnostic commits/issues that have been floating around:
E0631 uses a concept of "found/expected", as does E0271, "found/expected", but these errors have been reported as confusing and backwards. So it seems wise to head off all confusion in the future, and it seems to me that the implied "directionality" could be read as flowing one way or another depending on perspective, and might in fact change if a compiler optimization happens to adjust the orders of things.
E0281, which is the error code that E0631 retired, used an expression more along the lines of "Z has Y, but that requires A according to B", which seems logically more time agnostic and resilient to refactoring (especially because it describes more closely the "this atom is being asked to be both Hydrogen and Helium" problem). The only problem as-such is that diagnostics probably are best if they are kept relatively terse and simple, but explicitness is still preferable.
@workingjubilee I had noticed the same. Especially confusing is that in the code for #64915, it seems variable names got swapped around (in multiple places!), so it got really confusing to trace through what happened...
Some notes for myself, while I begin to understand the code and think through the problem. Please feel free to chime in if you notice something I'm missing or misinterpreting!
I think
is where we handle closure cases (this tuple_arguments call seems to be set to TupleArguments when for closures and overloaded functions), so could probably benefit from the same enhancement.
Similarly,
handles external variadic c functions, and if we have a description of the types, we could suggest what's missing.
Here, we report the mismatch arg count error, and then here:
we assign a bunch of err_args to the formal_tys, which we use below:
to actually check the types.
I'm not sure whether I should
The first is likely overall faster (we're only doing the extra work when counts don't match, meaning we've definitely skipped or added extra args) but incomplete (if we've skipped and argument and added one extra, or swapped two arguments). Solving this problem in general and trying to figure out exact user intent is probably overkill, but would certainly be cool. Thoughts, @estebank?
These are the simplest cases we could try to detect:
function inputs: A B C
arguments: A B C // Typecheck!
arguments: A D C // D is incorrect!
arguments: A C // Forgot B!
arguments: A B D C // D is extra!
arguments: A C B // B and C are swapped!
Some more complex cases we might want to handle (assume AB could satisfy A or B):
function inputs: A B C
arguments: A BC B // B and C are swapped!
(I'll add more to the above tables as I think of them
The really tricky part is trying to detect combinations of these in general, and I think starts getting into the domain of diminishing returns.
What I propose, at least for a first pass to introduce this probing, is to assume that at most one of these mistakes have been made, and if we can't find such a mistake that makes the rest of the types line up, fall back to the error reporting we have now. As always, need someone with more experience/context to help guide my decision making here.
Closures
One thing to note: the AST keeps closures as a function with a single argument which is a tuple. A closure |()| {}
would be encoded in the AST as a callable with a single argument which is a tuple of a tuple, while || {}
would be encoded as a callable with a single argument which is an empty tuple. That is part of what you're seeing there.
we assign a bunch of err_args to the formal_tys, which we use below:
As you can see we use tcx.types.err
, which is a special type similar to !
and ()
in that it has special meaning but users have no way of creating this in rust code. We use this to signal parts of the AST or the type tree that have had problems. When doing typechecking, for example saying that foo
and bar
have to have the same type, of either one of them is a TyErr
we take a note of it, make the resulting type TyErr
and do not emit an error, because whatever created a TyErr
has already emitted one. This way, TyErr
is "viral" (in that it infects everything it touches) and has a "calming" effect (in that knock down errors caused by an earlier error are not emitted). This later behavior is useful for cases like the following:
let foo = <Something that causes a type error>; // We emit the original error that the user can fix
bar(foo); // We hide a type error that will be fixed if `foo` is fixed.
Marking all the arguments TyErr
just hide all possible errors from arguments. This was to address similar concerns to this ticket: if the first argument of a 10 parameter function (don't write these :P) is missing, then you'd get 10 errors (1 for the mismatched argument count and 9 for type errors).
Also, there's an fn similar to demand_coerce
(which generates a new type from two types in an expression, the one known so far and the expected one and stores it in the type tables) called can_coerce
which will perform the same validation, but is side effect free and is suitable for error recovery when trying to check if different synthetic expressions would have been valid, like removing or adding a borrow.
thoughts?
For the two cases (incorrect order and missing args) I think that a reasonable approach would be:
I think you have a good base to work out in your summary, it's likely that we'll find things in the PR to change to make sure that we don't affect compilation times on the happy path, but it shouldn't be too hard.
Only favoring one suggestion off the table you offer seems like it would make mathematical sense, I think, because 1) all suggestions are merely probable answers (though confidence in an answer might exceed 99.9%, eventually the suggestion will falter) but 2) due to simple rules of probability, it is more likely that one suggestion will be correct, rather than two suggestions both being correct. Even if two errors seems likely, the combinatorial logic weighs against it. 3) So because we are only reporting one, suggestions should be made in the order the compiler (engineer) has the most confidence that the suggestion is applicable at all, which can weigh more on mere evidence.
@estebank
Thanks for the detailed writeup! Hopefully my verbose style won't be too big a drain on you as I start to learn more about the compiler. ;)
re: Closures, that's what I figured. Out of curiosity, why are closures stored this way?
re: err types, also in line with what I figured was happening.
re: algorithm, I wasn't too concerned about a high asymptotic implementation here, for two reasons. First, as you mention, it's error recovery so it doesn't impact the happy path, and two, the number of elements we're working with here is really small. Even in contrived cases, we're talking dozens of parameters to compare, not thousands. I'm glad you're also not too concerned.
If we were only worried about missing arguments, longest common subsequence solves this problem pretty well (same algorithm that powers textual diffs and such). However, I'm not sure how to adapt it to consider swaps as well, so I'll do some sketching and thinking on that over the next few days.
@workingjubilee My comment about assuming 1 error was less about making one suggestion, and more about assuming that there was only one mistake, and falling back if we couldn't find it. So, in these examples:
function inputs: A B C
arguments: B // A and C are missing!
function inputs: A B C D
arguments: B A D // A and B are swapped, and C is missing. Or is A missing, and the third parameter is the wrong type?
Since (by your first point) we're not trying to figure out the "right" answer, only the "useful" answer, probing for just simple mistakes gives us good net value without huge investment in complexity.
Been working through a POC of the algorithm I'll use ( in a language I'm more familiar with, still not fluent enough in Rust to make it my scratch pad ;) ).
https://glot.io/snippets/fhbpy4kp93
Essentially, you're filling in a matrix of whether various inputs can be coerced to the relevant parameters. A row or column of all falses means an input that can't be satisfies or an argument that can't satisfy anything.
It still doesn't look for swap detection, and is still funky around duplicated types, but I wanted to share my work in progress.
So like
fn grid<T>(w: usize, h: usize) -> Vec<Vec<T>> {
let mut v = vec![];
for _ in 0..w {
v.push(Vec::with_capacity(h));
}
v
}
// this next fn definitely doesn't work because it doesn't have a T yet
fn is_easygoing(par: T, arg: T) -> bool {
par == arg
}
Hmmm....
This fell off my radar for a while because the world got complicated, but I'm about to open a PR :)
While discussing #64915 with @estebank, I asked if it would make sense to also provide a more detailed report about how closures mismatched. For example, introspecting on the types to detect when types are in the wrong order, or when arguments are mismatched.
He pointed out that we don't even do that for regular function calls right now. For example:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8288d94517c964071278be6833560f70
The above snippet could, in theory, provide a more detailed report which suggested that the arguments might be swapped in the first call, or that the second argument is missing in the second call. (Things like longest common subsequence could help here).
@estebank said he's been thinking about something like this for a while, and offered to provide some mentorship if I wanted to tackle this as a slightly bigger contribution, since it'd be easier than closure comparison off the bat.
This issue has been assigned to @Quantumplation via this comment.