luozhouyang / python-string-similarity

A library implementing different string similarity and distance measures using Python.
MIT License
991 stars 127 forks source link

Words instead of characters #17

Closed evyasonov closed 4 years ago

evyasonov commented 4 years ago

Hey

Thank you for the package! It's really amazing!

I'm wondering if it's possible to use words instead of characters in Levenshtein and Damerau-Levenshtein methods?

Result of a code below is 10:

from similarity.levenshtein import Levenshtein
print(Levenshtein().distance('Hello world', 'Hello brave new world'))

But if I use words instead of characters in the algorithm, I get 2 (insert 'brave', insert 'new').

Is it possible?

luozhouyang commented 4 years ago

Hi @evyasonov , it's quiet easy actually!

>>> from strsimpy.levenshtein import Levenshtein
>>> lv = Levenshtein()
>>> lv.distance('Hello world'.split(), 'Hello brave new world'.split())
2
>>>
evyasonov commented 4 years ago

@luozhouyang Wow! Amazing! Thank you very much for your answer!