Closed LukasBreitwieser closed 4 years ago
I think this is due to stemming.
By default, and this is how the demo is setup, all words are stemmed before entering the index. As a result neither "admiral" nor "addition" are stored as is in the index, instead their stemmed form is:
It is these words that are then used when calculating the edit distance, which is why "addition" matches a search for "admiral~2".
Stemming has been the cause of much confusion in the past (just take a look through some of the closed issues) but I think this is the first time I've seen it combine with an edit distance search to produce 'wrong' results.
Disabling the stemmer would stop returning 'wrong' results like this, though it would probably lead to other unexpected results in other searches.
So, lunr doesn't use soundex and I'm not sure it's technically a bug, more like one of the many compromises present in implementing full text search.
Hi Oliver,
Great project! I played a bit around with fuzzy search and observed the following behavior:
Initially, I thought it is a bug, since
admiral
andaddition
have a Levenshtein distance of 5. However, I was looking a bit further and came across the Soundex distance. According to [1]admiral
andaddition
have a Soundex distance of 2.Are you using Soundex or is this a bug? If you are using Soundex, is there a way to use Levenshtein distance instead?
Lukas
[1] http://www.ripelacunae.net/projects/levenshtein/