I'm finding errors raised by damerau_levenshtein_distance for code points that don't cause problems for levenshtein_distance. The following:
from jellyfish import damerau_levenshtein_distance, levenshtein_distance
cases = [
('NICHOLASŸ', 'NICHOLAS'),
('NICHOLAS\u0178', 'NICHOLAS'),
('ÀUĎREY', 'GERTRUDE'),
('\xc0U\u010eREY', 'GERTRUDE'),
]
for a, b in cases:
try:
l = levenshtein_distance(a, b)
except ValueError as e:
print("Problem calculating levenshtein_distance between %r and %r: %s" % (a, b, e))
try:
dl = damerau_levenshtein_distance(a, b)
except ValueError as e:
print("Problem calculating damerau_levenshtein_distance between %r and %r: %s" % (a, b, e))
run in Python 3.6 produces
Problem calculating damerau_levenshtein_distance between 'NICHOLASŸ' and 'NICHOLAS': Encountered unsupported code point in string.
Problem calculating damerau_levenshtein_distance between 'NICHOLASŸ' and 'NICHOLAS': Encountered unsupported code point in string.
Problem calculating damerau_levenshtein_distance between 'ÀUĎREY' and 'GERTRUDE': Encountered unsupported code point in string.
Problem calculating damerau_levenshtein_distance between 'ÀUĎREY' and 'GERTRUDE': Encountered unsupported code point in string.
I'm finding errors raised by damerau_levenshtein_distance for code points that don't cause problems for levenshtein_distance. The following:
run in Python 3.6 produces