Add fuzzy-matching for autocomplete

tconbeer commented 1 month ago

ESPECIALLY for member autocomplete (fuzzy-matching table and column names, etc.).

Dotrar commented 3 weeks ago

Hey @tconbeer - this program is awesome! thanks for making it!

I'm really keen to have fuzzy-matching, i often don't know what column names are in my database, so i rely on fuzzy matching (in tools like dbeaver) to get them for me.

Would it be as simple as changing the line here for what constitutes a matched completion option? https://github.com/tconbeer/harlequin/blob/main/src/harlequin/autocomplete/completers.py#L52

I'd imagine, for fuzzy matching, everything would have a "score" which you could threshold.. or maybe, you'd just sort all options by the score (so that the list will be everything but the most likely options are near the top)

This could be quite slow in straight python... do you have any rules / guidelines for adding in C libs into the application? FZF comes to mind.

This might turn out to be a simple change, that I can have a look at doing if you would like.. I'm not sure what the difference between word and member completion classes are though 😅

tconbeer commented 3 weeks ago

Yes, you're on the right track there. To answer your questions:

The WordCompleter is called when you start typing any word character in the editor. It matches any keyword, db object, etc. The MemberCompleter is called when you type . (and any subsequent word characters); it returns matches that are members of a certain parent context (e.g., column names if the parent is a table). Both would likely benefit from fuzzy-matching, possibly with some different config. (There is also a PathCompleter but that's probably out of scope).
I agree that speed is important here. I'm fine adding c-extensions if they distribute wheels for most platforms. (e.g., I briefly looked at using a Trie for this, and marisa-trie was a candidate, since they distribute wheels for everything). If they don't have wheels, I'd want to add them as an optional dependency to make installing Harlequin as easy as possible for most people. RapidFuzz looks very promising. That said, I wouldn't rule out a pure-python approach; what stands today was my naive implementation that I assumed would be too slow but wasn't. A lot of string operations, regex, etc., are highly optimized by the stdlib already.
Textual already ships with a regex-based fuzzy matcher (source) that powers the command palette. I don't know if this is fast enough or provides the features we want but it could be a good starting point. If it's not fast enough, one basic optimization would be to replace their regex pattern with something that bounds the number of missing characters:

        self._query_regex = compile(
            ".{0,5}?".join(f"({escape(character)})" for character in query),
            flags=0 if case_sensitive else IGNORECASE,
        )

tconbeer / harlequin

Add fuzzy-matching for autocomplete #601