lucaong / minisearch

Tiny and powerful JavaScript full-text search engine for browser and Node
https://lucaong.github.io/minisearch/
MIT License
4.64k stars 133 forks source link

How to have a search at least as good as `includes` #243

Closed Barbapapazes closed 8 months ago

Barbapapazes commented 8 months ago

Hello,

I'm using minisearch in this project, https://github.com/unjs/website, in order to filter some data.

However, when I search for the term 'x', search does not return result like 'ipx' which is annoying because a simple includes could do better for a single field but miniSearch is able to search on multiple fields and do way more than just an include.

Could you help me? Thanks!

Related to https://github.com/unjs/website/issues/197

lucaong commented 8 months ago

Hi @Barbapapazes , MiniSearch supports exact match (searching for ipx would match documents containing the term ipx), prefix search (searching for i or ip would match documents containing the term ipx), and fuzzy match (searching for ipy would match documents containing the term ipx, if the fuzzy parameter is high enough). These are the kind of full-text matches that can be made efficient both in term of index memory utilization and of computation cost during indexing and search. By default, MiniSearch does not match documents that contain the query term in any arbitrary position (like includes would).

MiniSearch has a different goal than includes: it is designed to support full-text search, providing performant search, relevance scoring, multiple fields, fuzzy match, boosting, and all the nice things that one expects from a full-text search engine, while fitting comfortably in the browser memory. The String.prototype.includes method, conversely, performs a "brute force" search anywhere in the string: this will match results that contain the query anywhere, but won't be performant when searching many documents, and is not a full-text search engine, so it won't support things like fuzzy match or relevance scoring.

That said, if you need to match your query to arbitrary positions (even in the middle of a term), there is a way to achieve that in MiniSearch. It is explained in this comment. The drawback is that it will make the index larger, as it has to index all suffixes of each term.

In sum: it is possible to achieve what you describe with MiniSearch, but at the cost of a larger index in memory. The link above provides an example. My suggestion, though, is to think whether this is really needed in your case: is it reasonable to expect to find documents containing the term ipx when searching for just the character x? That would mean returning a lot more results, basically any document containing the character x. Is that useful? Most search engines won't do that, as it seems counterproductive for many use cases, but if your case requires it you can follow the example I linked above (and set the minimum suffix length to 1 to make it work with a single character).

Perhaps, you need to match in the middle of terms only for the package name, but not for other fields? If so, you can apply the custom processTerm from the example just for the package name, and use the default processor for the other fields.

lucaong commented 8 months ago

@Barbapapazes I think the question was answered, so I will close the issue, but feel free to comment further if needed.

Barbapapazes commented 8 months ago

@Barbapapazes I think the question was answered, so I will close the issue, but feel free to comment further if needed.

hey, thanks for your clear anwser! 💛

That would mean returning a lot more results, basically any document containing the character x.

I haven't thought of that and clearly, that's not a behavior I want.

Perhaps, you need to match in the middle of terms only for the package name, but not for other fields? If so, you can apply the custom processTerm from the example just for the package name, and use the default processor for the other fields.

I will take a look at this to see if it can help!

Once again, thanks for you answer!