Open hoelzro opened 5 years ago
Another interesting data point for this: e-mail
is treated as two tokens, which kind of screws things up
Would it make sense just to use a tokenizer that recognizes certain exceptions (like e-mail
) and certain special prefixes (like re-
)? Alternative to a list of exceptions, we could have logic that bundles prefixes of a certain length (eg. 3 or fewer characters)
I might want to tweak how the plugin uses lunr to tokenize things, to handle hyphenated words or URLs.
Examples:
https://github.com/hoelzro/tw-full-text-search/issues/5#issuecomment-441724510
https://github.com/hoelzro/tw-full-text-search/blob/9d383acb81c61608b7b5cbc61ced161ce4d54c95/tests/test-simple.js#L269-L281