Closed anschlechter closed 3 years ago
@anschlechter thanks for reporting the issue.
~~the implementation used in the library for longestCommonSeq is here:
https://github.com/vickumar1981/stringdistance/blob/master/src/main/scala/com/github/vickumar1981/stringdistance/impl/LongestCommonSeqImpl.scala#L4~~ (fixed)
Will add that test case and compare implementations. :+1:
@anschlechter published a 1.2.5-SNAPSHOT
with your changes included here: https://github.com/vickumar1981/stringdistance/pull/65
Sonyatype snapshot repo: https://oss.sonatype.org/content/repositories/snapshots/com/github/vickumar1981/
Testing on my end, it looks to be much faster :). Thanks for the great test cases and the implementation fix!
If you could/would possibly pull down or use the 1.2.5-SNAPSHOT
and confirm that this fixes the issue, then I can merge these changes and include them in the 1.2.5
release on maven.
Thanks again! cheers.
Hi, I tried the 1.2.5-SNAPSHOT and it looks good to me. Thank you.
Addressed by: https://github.com/vickumar1981/stringdistance/pull/66
Fixed in version 1.2.5
. Published v1.2.5
artifact to sonatype and should be available on maven central shortly, once it syncs.
longestCommonSeq is not efficient, it takes a very long time if the strings to be compared have a large distance.
The implemented algorithm has exponential complexity O(2^(n+m)).
Expected Result
predictable performance. algorithm should have O(n*m) complexity as is possible.
c.f. https://www.techiedelight.com/longest-common-subsequence/
--> translated to SCALA:
def LCSLength(X: String, Y: String): Int = { val m = X.length val n = Y.length // lookup table stores solution to already computed subproblems, // i.e.,
T[i][j]
stores the length of LCS of substring //X[0…i-1]
andY[0…j-1]
val T = Array.ofDim[Int](m + 1, n + 1) // fill the lookup table in a bottom-up manner for (i <- 1 to m) { for (j <- 1 to n) { if (X.charAt(i - 1) == Y.charAt(j - 1)) { // if the current character ofX
andY
matches T(i)(j) = T(i - 1)(j - 1) + 1 } else { // otherwise, if the current character ofX
andY
don't match T(i)(j) = Integer.max(T(i - 1)(j), T(i)(j - 1)) } } } // LCS will be the last entry in the lookup table T(m)(n) }Actual Result
"unpredictable" runtime
Reproduction Steps
Compare "RUE NELSON MANDELA" with "RUE ALBERT UNGEHEUER" -> it just takes way to long to find out that these are not the same.