[Feature request] consider moving to uFuzzy

KraXen72 commented 1 year ago

Is your feature request related to a problem? Please describe. https://github.com/leeoniya/uFuzzy is apparently faster & yields more results than minisearch. see comparison: https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy,MiniSearch&search=super%20ma

Describe the solution you'd like if not too much hassle, swap searching library to uFuzzy

Describe alternatives you've considered continue using minisearch, as it is ok now.

Additional context more info about uFuzzy in their readme

scambier commented 1 year ago

I can take a look to see how it performs (in terms of speed and quality) just out of curiosity, but unless I'm blown away by the results, there's little chance I'm going to replace Minisearch.

leeoniya commented 1 year ago

but unless I'm blown away by the results

when not trying to fuzzy match, MiniSearch is one of the better ones (fast to search, no terrible matches, but startup indexing still takes 600ms vs 0.5ms for my 162k test dataset and consumes hundreds of MB ram), so you won't be as blown away as someone coming from Fuse.

that being said, MiniSearch is quite bad outside of exact matches or exact prefixes. its fuzzy option brings back completely irrelevant results so is not good for typo tolerance. it also suffers from the same problem as many other searches -- it produces a single "score" that is then used for ordering. if we change uFuzzy settings to behave similarly to MiniSearch in terms of matching (outOfOrder, exact + prefixes), you can see how a slight change in its scores puts the results in :exploding_head: order. i don't think this is something that can fundamentally be fixed in most libraries that rely on composite relevance scores. you can read more thoughts about it here: https://github.com/leeoniya/uFuzzy#a-biased-appraisal-of-similar-work

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy,MiniSearch&outOfOrder&interLft=1&search=super%20ma

leeoniya commented 1 year ago

i copied your settings from

https://github.com/scambier/obsidian-omnisearch/blob/c896fd4094759c1739b832c1699810e47afa5146/src/search/omnisearch.ts#L161-L165

into

https://github.com/leeoniya/uFuzzy/blob/15c0809b29f80d60abd6df33b83a499ef0085644/demos/compare.html#L857-L860

and got 0 results with options.prefixLength = 3, and the screenshot above with options.prefixLength = 1 (basically just prefix: true)

scambier commented 1 year ago

Thanks for those comments and clarifications. From your README:

In its default configuration (uFuzzy) is likely a poor fit for applications like spellcheck or fulltext/document search.

For context, fulltext search is an important feature for me. My goal with Omnisearch is to allow retrieval of documents from queries that often don't match with filenames or headings. I need it to work in a "good enough" way for varied users who all use Obsidian differently; many of them have folders that contain thousands of (sometimes quite large) documents.

I see how your demo returns relevant results with shorter queries, but I'm mainly worried with the performances and results quality for all those users. Since Minisearch is at least "good enough" for them, I'm cautious about switching it for something else and breaking their workflow.

leeoniya commented 1 year ago

yep, no worries. :+1:

for sure there's a tipping point when building an index is going to be more appropriate; "thousands of (sometimes quite large) documents" is definitely one of those cases. for fulltext search, you're better off stemming and doing spelling correction on the needle before continuing with efficient lookups in the index.

if you do end up experimenting with uFuzzy, i can help you with the settings to get the most out of it.

scambier / obsidian-omnisearch

[Feature request] consider moving to uFuzzy #179