lithammer / fuzzysearch

:pig: Tiny and fast fuzzy search in Go
https://pkg.go.dev/github.com/lithammer/fuzzysearch
MIT License
1.13k stars 58 forks source link

Change LevenshteinDistance to Damerau–Levenshtein distance? #41

Open tang-hi opened 2 years ago

tang-hi commented 2 years ago

Hi, Damerau–LevenshteinDistance is the minimum number of edit operations ( insertions, deletions ,substitutions and transposition) required to change one word to another.It has one more action(transposition) compared to LevenshteinDistance. I think Damerau–LevenshteinDistance is more practical than LevenshteinDistance. And it's time complexity is also O(N^2), but unfortunately it's space complexity is O(N^2) while LevenshteinDistance's space complexity is O(N)

Do you think it's an acceptable change? If so, I would like to raise a PR to change LevenshteinDistance to Damerau–Levenshtein distance.

lithammer commented 2 years ago

It would be nice to see some benchmarks comparing them.

tang-hi commented 2 years ago

I have implemented the Damerau–LevenshteinDistance(Optimal string alignment distance), and do some benchmark on them. 1656688418264 It seems like Damerau–LevenshteinDistance is 2x slower than LevenshteinDistance.Do you think it's an acceptable change? If the performance is ok, I will clean the code and raise a pr

benchmark code

lithammer commented 2 years ago

Thinking about this a bit more, I'm reluctant to replace the current implementation with something twice as slow with vague real-world advantages. I think a better approach would be to make the edit distance function pluggable (with the current LevenshteinDistance function as a default).

I will investigate how this can be done without breaking changes (hopefully). And afterwards maybe you can contribute a DamerauLevenshteinDistance function? What do you think?

tang-hi commented 2 years ago

Thinking about this a bit more, I'm reluctant to replace the current implementation with something twice as slow with vague real-world advantages. I think a better approach would be to make the edit distance function pluggable (with the current LevenshteinDistance function as a default).

I will investigate how this can be done without breaking changes (hopefully). And afterwards maybe you can contribute a DamerauLevenshteinDistance function? What do you think?

I think it's good idea.I'm looking forward that