roy-ht / editdistance

Fast implementation of the edit distance(Levenshtein distance)
MIT License
660 stars 61 forks source link

Accurate benchmarks? #36

Open ekreutz opened 5 years ago

ekreutz commented 5 years ago

Hey, I ran a quick benchmark of my own on, using:

In my tests python-Levenshtein is about 10x faster. Perhaps it's the macOS binaries? Or maybe your tests are outdated?

import editdistance
from Levenshtein._levenshtein import *
from timeit import timeit

s00 = "akjsdjkahsdhjkashd"
s01 = "akjsdjkahsdhj"
s10 = 'xyzz'
s11 = 'xab'
s20 = 'aaaaaaaaaaaaaaaaaaaaaaa'
s21 = 'i'

def a1():
    editdistance.eval(s00, s01)
    editdistance.eval(s10, s11)
    editdistance.eval(s20, s21)
def a2():
    distance(s00, s01)
    distance(s10, s11)
    distance(s20, s21)

print("editdistance")
print(timeit(a1, number=100000))

print("\npython-Levenshtein")
print(timeit(a2, number=100000))

Prints:

editdistance
0.330241583

python-Levenshtein
0.03681695899999998
desialex commented 5 years ago

I just compared those two in a real-life application and editdistance is about 30% faster.

maxbachmann commented 3 years ago

At least in my benchmarks this is largely dependent on the length of the input strings. Here is a comparision for different libraries using different string lengths. Both edlib and editdistance appear to have a lot of overhead for short strings.

Only python-Levenshtein uses a quadratic time implementation, while all others use Myers/Hyyrös bitparallel implementation.

dzieciou commented 3 years ago

@maxbachmann Great chart. It shows the choice of implementation really depends on the application.

For the latter rapidfuzz seems like a good choice.