Closed Anish2 closed 2 weeks ago
it is a good feature the only doubt that call suggest
statement has 6 options related to selection of the best matched term and I see no way to use these options per terms in the suggested example match('whte~2)
@tomatolog Yes, we probably cannot use call suggest
, but is there a way to allow these general fuzzy queries in manticore without using an external store of words (computing variations externally)?
This is currently a significant limitation compared to Elasticsearch and other libraries (such as Meilisearch).
This is currently a significant limitation compared to Elasticsearch and other libraries (such as Meilisearch)
True. We'll think how we can improve and simplify this functionality.
Worth checking:
dev2
- takes about 0.5 sec to convert whte hrse
to white horse
, but haven't tried to tune it)The initial version of fuzzy search was implemented in this pull request: https://github.com/manticoresoftware/manticoresearch-buddy/pull/297
Query suggestion implementation: https://github.com/manticoresoftware/manticoresearch-buddy/pull/299
Both now support HTTP endpoints as well.
As we discussed on our call earlier, we've decided to enhance the logic and split the SQL/JSON handling as it comes from the request.
We'll also parse the match fields to identify 'query_string', handle it by expanding it into JSON query with bool/should queries, and add validation for unsupported operators in the query. with explain query
WIP: https://github.com/manticoresoftware/manticoresearch-buddy/pull/309
All improvements to fuzzy logic and query suggestions have been implemented in the following pull requests:
As we discussed today, we're going to add logic to control the fuzziness, similar to Elastic's approach, but with our own interface.
The Levenshtein distance we'll accept should be in the list [0, 1, 2].
By default, we should implement:
We should also consider adding 'append' and 'prepend' parameters to the 'CALL AUTOCOMPLETE' logic.
All fixes have been implemented and are ready for review: https://github.com/manticoresoftware/manticoresearch-buddy/pull/316
Hurray! 🎉
The initial version has been merged into the main branch and is now available as a showcase at https://github.manticoresearch.com
Here's what you should know about it:
Fuzzy Search:
To use it, simply add OPTION fuzzy=1
. For example, in SQL, you can do something like:
SELECT * FROM table WHERE MATCH('helo') OPTION fuzzy=1
This will do the job. There are some extra parameters to play with, like layouts
. You can apply layout strings or leave it empty if you want to turn it off. It's on by default. You can pass something like layouts='us,ru'
to use only two layouts, or layouts=''
to disable it completely.
There's currently a limitation in SQL queries where you can only use 'match' without operators. If you want more flexibility, it's a good idea to try sending an HTTP request instead.
Query Suggestions:
You can call this with SQL using CALL AUTOCOMPLETE('he', 'table')
. There are extra parameters available, similar to other procedures in Manticore, such as:
1 AS fuzziness, 'ru,en' AS layouts, 1 AS append, 1 AS prepend, 1 AS expansion_len
Documentation is on the way!
P.S. @PavelShilin89 Let's figure out what tests we need to cover all cases and move forward with them.
Is your feature request related to a problem? Please describe. Currently there's no way to do fuzzy queries per search term in a match query. By fuzzy query I mean an automatic Levenshtein distance search (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html).
Describe the solution you'd like
match('whte~2 hores~2')
should match "white horse".Describe alternatives you've considered Currently if we use an external store we can find similar words or using
CALL QSUGGEST()
also returns similar words. We can use this to search for similar words within a certain Levenshtein distance. However, this is not built into the query and does not search all possible variations.