ztane / python-Levenshtein

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
GNU General Public License v2.0
1.26k stars 155 forks source link

Q: restrict operations #32

Open kingjr opened 7 years ago

kingjr commented 7 years ago

Hi,

is it possible to restrict the matching to some operations, e.g.

# default
changes = editops(A, B, operations=('insert', 'delete', 'replace'))
# as opposed to
changes = editops(A, B, operations=('insert', 'delete'))
# or
changes = editops(A, B, operations=('replace', 'delete'))

thanks!

maxbachmann commented 2 years ago

This is not possible in python-Levenshtein, but at least one of them can be achieved in https://github.com/maxbachmann/RapidFuzz:

changes = editops(A, B, operations=('insert', 'delete', 'replace'))

changes = rapidfuzz.distance.Levenshtein.editops(A, B)

changes = editops(A, B, operations=('insert', 'delete'))

changes = rapidfuzz.distance.Indel.editops(A, B)

changes = editops(A, B, operations=('replace', 'delete'))

This is not possible and I am unsure how it would be implemented. I plan to add something along the lines of: changes = rapidfuzz.distance.Levenshtein.editops(A, B, weights=(1,1,1)). This would allow you the make insertions very expensive, so they would be avoided if possible. However, when Insertions are not allowed, not all sequences could be converted. E.g.:

editops("test", "teste", operations=('replace', 'delete'))

would not be possible, since it will always require an Insertion in the first string.