rockymadden / stringmetric

:dart: String metrics and phonetic algorithms for Scala (e.g. Dice/Sorensen, Hamming, Jaccard, Jaro, Jaro-Winkler, Levenshtein, Metaphone, N-Gram, NYSIIS, Overlap, Ratcliff/Obershelp, Refined NYSIIS, Refined Soundex, Soundex, Weighted Levenshtein).
https://rockymadden.com/stringmetric/
486 stars 81 forks source link

Levenshtein distance: change recursion to dynamic programming #28

Open ctongfei opened 7 years ago

ctongfei commented 7 years ago

The current implementation is too slow: it is a recursive algorithm without memoization, which means a lot of states are repeatedly computed. Try this dynamic programming version that guarantees O(mn) performance:

  def dist(x: String, y: String): Int = {
    val d = Array.ofDim[Int](x.length + 1, y.length + 1)
    for (i <- 0 to x.length) d(i)(0) = i
    for (j <- 0 to y.length) d(0)(j) = j
    for (j <- 1 to y.length; i <- 1 to x.length) {
      if (x(i - 1) == y(j - 1)) d(i)(j) = d(i - 1)(j - 1)
      else d(i)(j) = min(d(i - 1)(j), d(i)(j - 1), d(i - 1)(j - 1)) + 1
    }
    d(x.length)(y.length)
  }