wolfgarbe / SymSpell

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
https://seekstorm.com/blog/1000x-spelling-correction/
MIT License
3.15k stars 299 forks source link

Support for weighted edit distance #43

Open heatherleaf opened 6 years ago

heatherleaf commented 6 years ago

I'm not sure if SymSpell already has support for weighted edit distance. If so, please tell me how to use it.

Otherwise, I suggest to add this as another possible distance metric, in addition to Levenshtein and Damerau-Levenshtein. The implementation itself shouldn't be problematic: just use the weight matrix instead of the default unit cost. The matrix is input to the constructor, and for command line use it can be stored in a file. (I could in principle do it myself, but I don't know C#)

wolfgarbe commented 6 years ago
  1. There is a third-party SymSpell implementation with weighted Damerau-Levenshtein edit distance / keyboard-distance: https://github.com/searchhub/preDict

  2. Weighted edit distance can also be added as a post-processing step. The preliminary SymSpell results could be filtered/re-sorted according to your preferences.

  3. It is planned to add a weighted edit distance to SymSpell in the future, but there is no timeline yet.

heatherleaf commented 6 years ago
  1. Thanks, I'll look into that!
  2. Yes, but then the problem is the maximum edit distance parameter: I would like this parameter to be the true weighted cost, otherwise I have to use a too large value.
  3. I hope you'll get the time for it.