Closed b03505036 closed 1 year ago
A couple of things:
1) it is not entirely clear to me why you would like to rewrite the python implementation in Cython. Rapidfuzz provides a fast C++ implementation (using Cython to generate the wrapper code) and a pure Python fallback implementation. This pure Python implementation is used in case the C++ version fails to compile on a system. So rewriting it in Cython would defeat the purpose
2) I assume you are benchmarking against rapidfuzz.distance.Levenshtein.distance
and not rapidfuzz.distance.Levenshtein_py.distance
, which is likely the C++ version. In addition the library performs a lot of optimizations to be even faster when using metrics like the Levenshtein distance with the process module (e.g. process.cdist)
3) your implementation appears like you used the Python implementation as base and tried to add type hints. Your type hints are only correct if the following conditions hold true:
At the very least these conditions should be checked, so it does not crash if they are called with invalid strings from Python. Both the C++ and Python implementation of RapidFuzz are able to handle arbitrary Unicode strings.
thank you for the reply. And where can I find the C++ implementation?
The C++ implementation is integrated as a git submodule in extern/rapidfuzz-cpp
to allow using it standalone in C++ applications. You can find it in https://github.com/maxbachmann/rapidfuzz-cpp.
I re-write the Levenshtein_py Distance by pure Cython, I'm expecting it will be faster than the Levenshtein_py. But after I tested it, it is 9 times slower than RapidFuzz. I am not a python experienced python developer, would like to ask what makes this repo run so fast. Or how to improve my pure Cython version.
Thank youuuu!