Extend WeightedLevenshtein to have customizable insert / deletion weights.
Previously, insert / deletion weights were hardcoded at 1.0. Customizing
them allows the caller to under-weight the insertion of a thin letter like
I or l to reflect the likelihood of OCR errors (for example).
This adds a new interface, CharacterInsDelInterface, which is an
adjunct to CharacterSubstitutionInterface. The old behavior is preserved
if the caller does not provide a CharacterSubstitutionInterface subclass.
This also adds insert / deletion tests to the old
WeightedLevenshteinTest.testDistance, and adds a new
testDistanceCharacterInsDelInterface test.
Coverage increased (+0.1%) to 94.949% when pulling cfcde791e2bbbe50fcdec2e3c3a983722113d6fc on NationalBI:weighted-levenshtein-ins-del into a5d842111753f77bb679c82c37628338f868aec8 on tdebatty:master.
Extend WeightedLevenshtein to have customizable insert / deletion weights. Previously, insert / deletion weights were hardcoded at 1.0. Customizing them allows the caller to under-weight the insertion of a thin letter like I or l to reflect the likelihood of OCR errors (for example).
This adds a new interface, CharacterInsDelInterface, which is an adjunct to CharacterSubstitutionInterface. The old behavior is preserved if the caller does not provide a CharacterSubstitutionInterface subclass.
This also adds insert / deletion tests to the old WeightedLevenshteinTest.testDistance, and adds a new testDistanceCharacterInsDelInterface test.