rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
99.01k stars 12.79k forks source link

Suggest giving recommending substitution based on naming conventions with available #64585

Open maaku opened 5 years ago

maaku commented 5 years ago

New rustacean here. In writing simple programs to get a feel for the language, I tried testing if a string slice is empty as the conditional for a if/else statement. Something like the following:

if mystr.empty() {
    // ...
} else {
    // ...
}

Of course the correct method is .is_empty(). I would suppose that this is a common mistake for people coming from languages like C++. It would be friendly of the compiler to recognize that in this boolean context there is a .is_empty() method that is the same name but with is_ prepended, takes the provided (no) parameters, and returns the required boolean, and suggest that alternative to the user in the error output.

varkor commented 5 years ago

If https://github.com/rust-lang/rust/issues/51398 was implemented, we could add empty as a synonym. However, we might also be able to improve our word-distance heuristic to catch examples such as this ("is_empty" shouldn't be very far from "empty").

sam09 commented 5 years ago

@varkor I looked at src/libsyntax/util/lev_distance.rs which has a function find_best_match_for_name. The function uses Levenshtein distance to find the closest match to a given word from a list of words.

My guess is that this function is the one used to suggest alternatives to the user in case there is no method with a given name. The caveat here is that the function takes the distance by default as one-third of the word size.

In the above case empty is of length 5 and one-third of that is 2. However, empty and is_empty are at a distance of 3 and hence the compiler doesn't suggest the correct alternative.

I tried some examples to verify this,

Distance 1

if mystr.isempty() {
    // ...
} else {
    // ...
}

error[E0599]: no method named `isempty` found for type `&str` in the current scope
if mystr.isempty() {
  |          ^^^^^^^ help: there is a method with a similar name: `is_empty`

Distance 2

if mystr._empty() {
    // ...
} else {
    // ...
}

error[E0599]: no method named `_empty` found for type `&str` in the current scope
if mystr._empty() {
  |          ^^^^^^^ help: there is a method with a similar name: `is_empty`
if mystr.iempty() {
    // ...
} else {
    // ...
}

error[E0599]: no method named `iempty` found for type `&str` in the current scope
if mystr.iempty() {
  |          ^^^^^^^ help: there is a method with a similar name: `is_empty`
if mystr.sempty() {
    // ...
} else {
    // ...
}

error[E0599]: no method named `sempty` found for type `&str` in the current scope
if mystr.sempty() {
  |          ^^^^^^^ help: there is a method with a similar name: `is_empty`
if mystr.jsempty() {
    // ...
} else {
    // ...
}
error[E0599]: no method named `jsempty` found for type `&str` in the current scope
 --> a.rs:3:10
  |
3 | if mystr.jsempty() {
  |          ^^^^^^^ help: there is a method with a similar name: `is_empty`

I think distance should probably be a function of the size, smaller words might need a larger distance.

Again, I am not sure if this is the same function that is being used to find the closest name to a method

varkor commented 5 years ago

@sam09: I think this is the right function. I agree that it'd be helpful to have a higher cutoff for smaller words. You could try using a slightly different default distance function, and see if there are many changes after running x.py test src/test/ui --bless. I almost feel Levenshtein distance isn't the right metric — inserting consecutive new characters should have a lower distance than the naïve estimate. If you were interested, you could play around with some other examples, see if you can find a better estimator, and open a pull request with it?

maaku commented 5 years ago

I’m not sure that is the right approach. If you allow a larger Levenstein distance you’ll just let in an lot of totally unrelated method names. The idea is that the compiler can exploit knowledge about naming conventions to give better suggestions. So, e.g. use the current cut offs but compare against ‘is_{}’ as well, at least in a Boolean context.

estebank commented 5 years ago

An option would be to try both levenstein distance and substring containment, as in for possible in suggestions { if possible.contains(&name) ... }.

I would say that this would need a cutoff of words shorter than 4 or 5 chars to avoid matching things like is, but I wouldn't be surprised to find that this approach would work quite well in the wild.

estebank commented 1 year ago

Current output:

error[E0599]: no method named `empty` found for struct `Vec<_>` in the current scope
 --> src/main.rs:2:11
  |
2 | if vec![].empty() {
  |           ^^^^^ help: there is a method with a similar name: `is_empty`

We still don't check for an appropriate return type though, it is purely based on levenshtein distance.