luozhouyang / python-string-similarity

A library implementing different string similarity and distance measures using Python.
MIT License
991 stars 127 forks source link

Speed of levenshtein #21

Closed reza1615 closed 3 years ago

reza1615 commented 4 years ago

Thank you for your great package I compared this package speed with other cpython pakages and it is slower. is it possible to improve the speed?


a = 'fsffvfdsbbdfvvdavavavavavava'
b = 'fvdaabavvvvvadvdvavavadfsfsdafvvav'
# levenshtein
%%timeit
import editdistance
editdistance.eval(a, b)
# 2.12 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%%timeit
from strsimpy.levenshtein import Levenshtein
Levenshtein = Levenshtein()
Levenshtein.distance(a,b)
# 528 µs ± 990 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)```
github-actions[bot] commented 4 years ago

Thanks for your first issue!

luozhouyang commented 4 years ago

editdistance is fast because it is implemented in C++. But this library is purely implemented in Python, so it is much slower.