Closed overengineered closed 3 days ago
I think ranking level is not really appropriate for controlling this functionality. As an example for query defer
with proposed typo tolerance the following matches are ordered like this:
lawyer defends victims United States of America indestructible dog crate
All those results are not good matches for the query, but when ordering for similarity to defer
in my mind this order seems correct. Only the second match would be findable without typo tolerance. Essentially, typo tolerance adds both "better quality" results and "lower quality" results. So it seems that it would be best for typo tolerance to be in MATCHES ranking and let scoring function do its work.
In my view the MATCHES ranking is for results where no good matches were found and we are scraping the bottom of the barrel to see if something might be similar. Even current approach for MATCHES ranking can find lots of weird matches - and proposed changes do not drastically change behaviour. If spurious matches are not appropriate, increasing threshold
seems like the best way to turn it off.
But if this new matching should be controlled separately, I would do it like this:
interface MatchSorterOptions<ItemType = unknown> {
<..>
sorter?: Sorter<ItemType>
+ fuzzy?: 'scattered' | 'partial'
}
I think true
/ false
do work here as current MATCHES behaviour is an already little bit fuzzy. I think these constants fittingly describe behaviour differences.
I just had an idea to allow typo tolerance with ranking like this:
- MATCHES
+ TIGHT_PARTIAL
+ SCATTERED
Basically typo tolerant matching would be higher than the current MATCHES matching. However my proposed code change would have to be modified so that it allowed skipping letters only if matching letters are packed tightly. I would imagine restricting finding all the letters except one within a span that is just a few letters longer than initial query would almost always be higher quality match than finding all letters, but widely scattered.
I think you're right. This is fine. If people find it's problematic then we can ship a breaking change with new levels.
:tada: This PR is included in version 6.4.0 :tada:
The release is available on:
Your semantic-release bot :package::rocket:
FYI, this change caused one of our automated tests to fail: https://github.com/ariakit/ariakit/pull/4218/files#diff-ffb64734a719f82219f5f82d3ee8c52e111c4a560940b6bb14b60ab6d3e74de0
This isn't a problem. I'll update the test. The UI works well, and I believe it's a good feature. I'm sharing this so you might consider making it optional to avoid any friction until the next major.
@diegohaz, I think I'll just do a major version bump and deprecate 6.4.0
I've deprecated 6.4.0 and released 7.0.0: https://github.com/kentcdodds/match-sorter/releases/tag/v7.0.0
What: Increases tolerance for typos in data and/or query text
Why: match sorter currently finds data with some typos, but others. E.g. "canceled" would find "cancelled", but not "cacneled".
How:
getClosenessRanking
calculator allows skipping one of the characters from the query. This allows finding data that contains the most common typos: swapped letters, missing letters and diverging spelling (localisation/localization). Matches that have missing letters have reduced ranking.Checklist:
Since this change only affects searches where no good match (like full word, or exact start or word) gets found, I think there's no need to have add to options some flag to disable/enable this feature. If searching
ua
and findingUnited States of America
is good enough right now, then my changes that improve tolerance for typos should be also acceptable.But I can see that there could be some value in providing configuration for this new feature - and I am open to do it.