The matching results could be improved by using unicode normalization on them. This should be a processor function, since users might be interested in the distance without normalization. In addition it would be weird if Levenshtein.distance(s1, s2) differs from len(Levenshtein.editops(s1, s2)). At the same time it is not possible to use the normalization for Levenshtein.editops, since the editops need to map to a specific character in the source.
It would probably make sense to update utils.default_process to normalize strings as well.
The matching results could be improved by using unicode normalization on them. This should be a processor function, since users might be interested in the distance without normalization. In addition it would be weird if
Levenshtein.distance(s1, s2)
differs fromlen(Levenshtein.editops(s1, s2))
. At the same time it is not possible to use the normalization forLevenshtein.editops
, since the editops need to map to a specific character in the source.It would probably make sense to update
utils.default_process
to normalize strings as well.