swarn / fzy-lua

A lua implementation of the fzy fuzzy matching algorithm
MIT License
69 stars 6 forks source link

fix: Fix perfect match scenario #6

Closed Quenty closed 1 year ago

Quenty commented 1 year ago

In the Lua implementation the perfect match scenario had a bug where it would consider any string of the same length as a perfect match. This fixes it.

swarn commented 1 year ago

Wow, that's an ugly bug. Thanks for the PR! I will merge it and add a few tests in the next day or two.

swarn commented 1 year ago

@Quenty, do you have a test case that demonstrates the bug?

Below is my current thinking; what am I missing?

For two strings a and b, has_match(a, b, true) must be true before you call score(a, b, true). And in that case, a must be a case-sensitive subsequence of b, so if they have the same length, they match.

Quenty commented 1 year ago

Ah, it occurs if you score directly without calling filter(). I guess I've been using this library wrong. :P Since all members of the library appear to be public, especially given testing, it's perfectly reasonable to use the score method without calling hasMatch().

Performance implications here are pretty small, so it's worth including this for completeness.

swarn commented 1 year ago

You're right that the API is a bit fishy. There is an invariant,

has_match(a, b, s) == true

which is required before calling score(a, b, s), but that invariant is not enforced by the API, and that has always made me a little queasy.

I disagree about the performance implications. I made this for use in text editor fuzzy finders, where it may be used to find across all file names in repository. That can be a very large haystack, and if the UI using the library updates on every keystroke, the latency matters. Adding an extra case conversion and string compare to score has notable effects on performance.

This is also the reason for the division of has_match and score. The has_match function is faster and will tend to remove most results before the slower score function is applied. The filter function is the safe version which does both, but the others can be useful on their own, so they remain in the public API.

I hadn't considered calling score without has_match. It looks like it returns fzy.get_score_min() for non-matching needles, but I'd have to do a bit of work to be convinced that's true in all cases.

Any thoughts? Thanks for the feedback so far, and if you have any thoughts on how I should modify the documentation to make the oddness of score stand out more, I'd appreciate it.