manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.87k stars 490 forks source link

Add Elastic-like fuzzy queries to Manticoresearch #1497

Closed Anish2 closed 2 weeks ago

Anish2 commented 11 months ago

Is your feature request related to a problem? Please describe. Currently there's no way to do fuzzy queries per search term in a match query. By fuzzy query I mean an automatic Levenshtein distance search (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html).

Describe the solution you'd like match('whte~2 hores~2') should match "white horse".

Describe alternatives you've considered Currently if we use an external store we can find similar words or using CALL QSUGGEST() also returns similar words. We can use this to search for similar words within a certain Levenshtein distance. However, this is not built into the query and does not search all possible variations.

tomatolog commented 11 months ago

it is a good feature the only doubt that call suggest statement has 6 options related to selection of the best matched term and I see no way to use these options per terms in the suggested example match('whte~2)

Anish2 commented 11 months ago

@tomatolog Yes, we probably cannot use call suggest, but is there a way to allow these general fuzzy queries in manticore without using an external store of words (computing variations externally)?

This is currently a significant limitation compared to Elasticsearch and other libraries (such as Meilisearch).

sanikolaev commented 11 months ago

This is currently a significant limitation compared to Elasticsearch and other libraries (such as Meilisearch)

True. We'll think how we can improve and simplify this functionality.

sanikolaev commented 11 months ago

Worth checking:

donhardman commented 3 months ago

The initial version of fuzzy search was implemented in this pull request: https://github.com/manticoresoftware/manticoresearch-buddy/pull/297

donhardman commented 3 months ago

Query suggestion implementation: https://github.com/manticoresoftware/manticoresearch-buddy/pull/299

Both now support HTTP endpoints as well.

donhardman commented 2 months ago

As we discussed on our call earlier, we've decided to enhance the logic and split the SQL/JSON handling as it comes from the request.

We'll also parse the match fields to identify 'query_string', handle it by expanding it into JSON query with bool/should queries, and add validation for unsupported operators in the query. with explain query

WIP: https://github.com/manticoresoftware/manticoresearch-buddy/pull/309

donhardman commented 2 months ago

All improvements to fuzzy logic and query suggestions have been implemented in the following pull requests:

donhardman commented 2 months ago

As we discussed today, we're going to add logic to control the fuzziness, similar to Elastic's approach, but with our own interface.

The Levenshtein distance we'll accept should be in the list [0, 1, 2].

By default, we should implement:

We should also consider adding 'append' and 'prepend' parameters to the 'CALL AUTOCOMPLETE' logic.

donhardman commented 1 month ago

All fixes have been implemented and are ready for review: https://github.com/manticoresoftware/manticoresearch-buddy/pull/316

donhardman commented 1 month ago

Hurray! 🎉

The initial version has been merged into the main branch and is now available as a showcase at https://github.manticoresearch.com

Here's what you should know about it:

Fuzzy Search:

To use it, simply add OPTION fuzzy=1. For example, in SQL, you can do something like: SELECT * FROM table WHERE MATCH('helo') OPTION fuzzy=1

This will do the job. There are some extra parameters to play with, like layouts. You can apply layout strings or leave it empty if you want to turn it off. It's on by default. You can pass something like layouts='us,ru' to use only two layouts, or layouts='' to disable it completely.

There's currently a limitation in SQL queries where you can only use 'match' without operators. If you want more flexibility, it's a good idea to try sending an HTTP request instead.

Query Suggestions:

You can call this with SQL using CALL AUTOCOMPLETE('he', 'table'). There are extra parameters available, similar to other procedures in Manticore, such as: 1 AS fuzziness, 'ru,en' AS layouts, 1 AS append, 1 AS prepend, 1 AS expansion_len

Documentation is on the way!

P.S. @PavelShilin89 Let's figure out what tests we need to cover all cases and move forward with them.

PavelShilin89 commented 3 weeks ago

Done in https://github.com/manticoresoftware/manticoresearch/pull/2519