dumbmatter commented 1 month ago

153 is undoubtedly a great feature for many people, but not for me :)

The argument from @overengineered at #153 is:

match sorter currently finds data with some typos, but others. E.g. "canceled" would find "cancelled", but not "cacneled".

Personally I don't see the old behavior (allowing skipped letters) as a "typo", I often do that on purpose to search for multiple parts of a string. Like when searching a list of filenames, I type a few letters for the start of a filename, realize "oh this is a really common start of a filename, it's returning way too many results", and then start typing a later part to filter more precisely. Cause I'm used to how ctrl+p works in Sublime/VSCode/etc.

Now, sometimes the matches found by the new typo algorithm rank higher than ones that would have appeared previously with skipped letters. But more fundamentally, for my personal use, I never want typos (again, not considering the old behavior as "typos") returned. All they do is confuse my search by adding things I never want to select.

So an option to disable this behavior would be appreciated. https://github.com/kentcdodds/match-sorter/pull/153#issuecomment-2407748823 has a suggestion for how to do that, although to be honest I'm not totally sure which approach is intended to be "scattered" and which is supposed to be "partial". Or if you're able to enable/disable only the old approach or only the new approach. So I'd probably just make them separate booleans, maybe fuzzyGaps and fuzzySkipOne, idk, naming stuff is hard.

If you think this is a good idea and a PR would help move things along lmk, ideally with some comment on what the option(s) should be called.

kentcdodds commented 1 month ago

sometimes the matches found by the new typo algorithm rank higher than ones that would have appeared previously with skipped letters

I would consider this to be a bug. It's possible this change was ill-advised and I'm open to reverting it. I probably should have tried it out in an actual implementation to see how it feels. Sounds like that's what you did and you're suggesting it feels wrong/awkward. I'm more inclined to revert the change and let @overengineered have support for this via a personal fork instead.

I would rather do that than complicate the API with an option to disable this behavior. Thoughts?

overengineered commented 1 month ago

I think there's two kind of datasets to search in: curated datasets and user-generated content. @dumbmatter seems to be searching in curated dataset and I can see how my change could make ordering suboptimal. My usage for searching user-generated content and sometimes mistakes in datasets make some content unfindable.

I would propose to introduce "pseudo" ranking in the API to enable/disable this behaviour.

  MATCHES: 1,
+ PARTIALLY_MATCHES: 0.875,
  NO_MATCH: 0,

Allowing to skip letter from query would be enabled only if user of library explicitly sends PARTIALLY_MATCHES threshold.

EDIT: Upon further thought, it's best to have a real ranking for this, lower than MATCHES. For my use-case mixing partial and "scattered" matches would work better, but sorting partial matches below them is acceptable.

kentcdodds commented 1 month ago

Let's do that and see how it goes

kentcdodds / match-sorter

Add an option to disable "support finding data with typos" from version 6.4.0/7.0.0 #154

153 is undoubtedly a great feature for many people, but not for me :)