Open rsvoboda opened 1 month ago
Hey,
This would be a nice feature indeed.
there is an existing mapping for common English misspelled words, ...
I don't think a hard coded list will work, no. Fortunately, there are other solutions :)
We need to consider two things IMO: how to match "approximately", and when to match approximatly.
Fuzzy queries (which allow terms with one or two typos) are a thing, but I'd personally stay away from them, because:
A better approach is to have dedicated fields using an ngram analyzer, e.g. turn tokens into a list of 3-grams:
aplication
=> [apl, pli, lic, ica, ati, tio, ion]
application
=> [app, ppl, pli, lic, ica, ati, tio, ion]
`[pli, lic, ica, ati, tio, ion]
; that's enough to get a good score!We could do a "OR" between the current search criteria and the new "fuzzy" ones, but this means that, when searching without typos, we will return a long tail of potentially irrelevant results.
A perhaps better solution would be to run the search without typo support first, and only if we notice that search doesn't match anything, ignore it, then run another search with typo support (more fuzzy), then return the results of that second search.
I tried to explain how to do ngram search here: https://discourse.hibernate.org/t/slop-does-not-work-for-any-word/9253/6?u=yrodiere
As I mentioned above though, we probably don't want to put all predicates in the same query, but rather do something like this:
var results = doSearchWithoutTypoSupport(params);
if (results.total().hitCountLowerBound() == 0) {
results = doSearchWithTypoSupportUsingNgrams(params);
}
return results;
PRs welcome :)
I have an enhancement proposal to be permissive about typos when searching.
Here is an example: https://quarkus.io/guides/#q=aplication gives
Sorry, no guides matched your search. Please try again.
Same for https://quarkus.io/guides/#q=Configuring+your+application vs. https://quarkus.io/guides/#q=Configuring+your+aplicationIs there a way to tolerate typos because they are quite common, especially for non-native speakers?
Some approximation (I think there was something for it in HS), maybe there is an existing mapping for common English misspelled words, ...