vickumar1981 / stringdistance

A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
https://vickumar1981.github.io/stringdistance/api/com/github/vickumar1981/stringdistance/index.html
Other
78 stars 15 forks source link

Bug in Jaro Winkler score after 1.1.5 version #70

Closed SpaceCowboyMax closed 2 years ago

SpaceCowboyMax commented 2 years ago

Jaro Winkler algorithm error after 1.1.5 version update

sample code produces different result

  import com.github.vickumar1981.stringdistance.StringDistance.JaroWinkler
  println(JaroWinkler.score("kkk_k", "kkk"))

version 1.1.5

0.9066666666666667

newer versions

0.30000000000000004

Looks like problem in getCommonChars function of CommonStringDistanceAlgo interface

vickumar1981 commented 2 years ago

:+1: @SpaceCowboyMax I can confirm that this is broken after the code was refactored for using generalized arrays. Working on a fix. Thanks for reporting the issue and specifically where the bug is.

vickumar1981 commented 2 years ago

@SpaceCowboyMax published a 1.2.7 release that should be syncing up on maven central shortly which should fix the issue. Also released a 1.2.8-SNAPSHOT that can be used in the meantime, if you want to use the snapshot repository.

https://oss.sonatype.org/content/repositories/snapshots/com/github/vickumar1981/stringdistance_2.13/1.2.8-SNAPSHOT/

Let me know if that addresses the issue, and again, thanks for catching that and reporting the issue.

vickumar1981 commented 2 years ago

Fixed by commit: https://github.com/vickumar1981/stringdistance/commit/5800a5597297107e23c405a76e622ce927d5c6f1

SpaceCowboyMax commented 2 years ago

Thanks, looks like it fixed now