"Space word jump" heuristic works incorrectly when abbreviation has a space

dko-slapdash commented 4 years ago

I'm trying to set SCORE_CHARACTER_JUMP=0 to disable matching mid-word characters. This is to e.g. not let “Filter by: Assigned To” string to match "test" abbreviation which makes sense. Like "words prefix-only search".

But it has some side effect in how the library processes spaces.

Imagine for simplicity that we have string="se f" and abbreviation="s f"

In this case, we expect that the abbreviation will perfectly match the string, but it doesn't happen - the score returned is 0.

I think the solution is following:

I.e. if the currently checked abbreviation character is a space, and there is a space in the string, we should continue no matter how far this space is (not necessarily if the current string character is also a space).

ConradIrwin commented 4 years ago

@dko-slapdash Thanks for the report!

I think that setting SCORE_CHARACTER_JUMP = 0 is not quite what you want to do, because in the example you give you're jumping a character (e).

In general I'd like for command-score to allow aggressive abbreviations so act should definitely be allowed to match account; or ace throw. However as you point out, it's not clear that it's useful for act to match anchor point because the jump is across a word-boundary to not-the-start-of-the-next-word.

To fix that we'd need to teach the library to notice this kind of leap explicitly, which may be a little more complicated than the change you propose, and I'd be open to trying it out and seeing if the experience is better or worse.

FWIW: At superhuman we use this library in a UX that selects one item, and suggests three more out of a list that's a few hundred items long. In that scenario, it's OK for words that don't match anything very well to suggest something because it's better than just showing an error.

dko-slapdash commented 4 years ago

Thanks for the quick reply! We're actually experimenting with SCORE_CHARACTER_JUMP=0, SCORE_TRANSPOSITION=0 mode (because without SCORE_TRANSPOSITION=0, having just SCORE_CHARACTER_JUMP=0 is not enough). Considering that it's unlikely that people would search by mid-word characters (and also since our strings are 3-4 word sentences, not like CamelCaseIdentifiers or file_names.ts.

We also don't limit the number of matched strings, i.e. we use superhumanCommandScore as both filtering AND ranking tool.

ConradIrwin commented 4 years ago

If you only want to support matches that are prefix matches of words, then you can likely save a lot of computation time by reducing the number of matches considered.

Right now we do index = lowerString.indexOf(abbreviationChar, index + 1); to iterate over every potential character match. You can probably change this to jump to the next match of /\s{abbreviationChar}/ if you don't want in-word jumps at all, or something a bit more complicated if you want in-word jumps in the current word, but not jumps across spaces.

dko-slapdash commented 4 years ago

I think \s approach won’t work, because I want “sefi” (or “se fi” after the change I posted above) to match “search files”. I.e. here “e” matches a mid-word character, but it’s okay, because it goes after “s” which matches a start-word character; same is with “i” after “f”.

superhuman / command-score

"Space word jump" heuristic works incorrectly when abbreviation has a space #9