I'm facing a strange result using jaro_winkler function, which looks like a bug:
In [73]: Levenshtein.jaro_winkler('guerrilla girls', 'guerilla girls')
Out[73]: 0.9295238095238095
I was surprised to see such a low score for this simple "r" omission from a 15 characters string.
So I tried replacing the second "r" in the first string with a "b". The only thing that changes in this test is that the "r" omission becomes a "b" omission in the second string.
And now the score is pretty good, and much closer from what I expected:
In [74]: Levenshtein.jaro_winkler('guerbilla girls', 'guerilla girls')
Out[74]: 0.9866666666666667
I tried the two same tests with another library (jaro-winkler), and the two scores are equal in both situations (and they are equal to the second test made with python-Levenshtein):
In [77]: jaro.jaro_winkler_metric('guerrilla girls', 'guerilla girls')
Out[77]: 0.9866666666666667
In [78]: jaro.jaro_winkler_metric('guerbilla girls', 'guerilla girls')
Out[78]: 0.9866666666666667
What do you think about it? The first result is really weird, no?
Hi,
I'm facing a strange result using jaro_winkler function, which looks like a bug:
I was surprised to see such a low score for this simple "r" omission from a 15 characters string.
So I tried replacing the second "r" in the first string with a "b". The only thing that changes in this test is that the "r" omission becomes a "b" omission in the second string.
And now the score is pretty good, and much closer from what I expected:
I tried the two same tests with another library (jaro-winkler), and the two scores are equal in both situations (and they are equal to the second test made with python-Levenshtein):
What do you think about it? The first result is really weird, no?