stdlib-js / stdlib

✨ Standard library for JavaScript and Node.js. ✨
https://stdlib.io
Apache License 2.0
4.42k stars 468 forks source link

[RFC]: add fuzzy auto-completion in REPL #1845

Open Snehil-Shah opened 8 months ago

Snehil-Shah commented 8 months ago

Description

This RFC proposes adding fuzzy auto-completion extending the current strict auto-completion. This would allow us to forgive things like spelling mistakes and suggest more relevant completions.

Related Issues

Related issues https://github.com/stdlib-js/google-summer-of-code/issues/1

Questions

I played around a bit trying to write a fuzzy matching algorithm and using existing ones. Right now we are lexicographically sorting the completion results. should we do the same with fuzzy results mixed in? or should aim to sort results by 'relevancy'?

Other

No.

Checklist

kgryte commented 7 months ago

@Snehil-Shah Thanks for opening this. I think if we are to add support for fuzzy auto-completion, we'd want to rank/list according to relevancy, similar to how fuzzy search might work in a search engine. Lexicographic makes sense currently, as completions are exact prefix matches and we don't have additional criteria for sorting otherwise.

Similar to a search engine displaying results, we probably want a means for visually highlighting what is matching (e.g., if I type ys, then we could mark the fuzzy match in bold yes).

kgryte commented 7 months ago

Recently came across https://github.com/JunoLab/FuzzyCompletions.jl/blob/master/src/FuzzyCompletions.jl which uses Levenshtein distance and a fudge factor for scoring suggestions.

kgryte commented 7 months ago

...and also a fuzzy completer implementation in Prompt Toolkit: https://github.com/prompt-toolkit/python-prompt-toolkit/blob/master/src/prompt_toolkit/completion/fuzzy_completer.py

Snehil-Shah commented 7 months ago

@kgryte I don't think levenshtein distance is a good algorithm for code-completion as mentioned in OP of #1855 as a difference in length of the input and completion would also affect the score, which means it can score an exact prefix match worse than a not-so-exact match with a shorter length.

The prompt toolkit's algorithm is good and simple, but is not forgiving to spelling mistakes, as it requires each letter of the input string to exist in the completion string. I think we can modify this a bit using my own algorithm (#1855) to account for spelling mistakes as well

kgryte commented 7 months ago

@Snehil-Shah Yes, that's fair. Although you could probably combine Levenshtein with other approaches, such as favoring exact prefix matches.

Snehil-Shah commented 7 months ago

@Snehil-Shah Yes, that's fair. Although you could probably combine Levenshtein with other approaches, such as favoring exact prefix matches.

I think prompt toolkit's algorithm would be better than the levenshtein's. I edited the above reply in case you didn't read that..

kgryte commented 7 months ago

@Snehil-Shah That's fair. At this point, you are the expert!

kgryte commented 7 months ago

Another fuzzy auto-completion implementation: https://github.com/codemirror/autocomplete/blob/5ad2ebc861f2f61cdc943fc087a5bfb756a7d0fa/src/filter.ts

kgryte commented 7 months ago

Another fuzzy match algorithm: https://github.com/junegunn/fzf/blob/master/src/algo/algo.go

kgryte commented 7 months ago

Another fuzzy match algorithm: https://github.com/helix-editor/nucleo/tree/master/matcher/src