taleinat / fuzzysearch

Find parts of long text or data, allowing for some changes/typos.
MIT License
301 stars 26 forks source link

how does it compare with rapidfuzz and wuzzyfuzzy? #39

Open sanjeevpe opened 2 years ago

sanjeevpe commented 2 years ago

Hi, thank you for this repository. I was wondering if you've benchmarked speed, accuracy for Levenshtein distance between fuzzy search v/s RapidFuzz and fuzzywuzzy?

https://github.com/maxbachmann/RapidFuzz

taleinat commented 2 years ago

Hi, good question! This should be clearly addressed in the documentation.

The main difference is that fuzzysearch is intended for searching through long texts or sequences for partially-matching sub-strings or sub-sequences. FuzzyWuzzy and RapidFuzz, on the other hand, are intended for comparing pairs of strings and calculating similarity metrics (such as the Levenshtein distance) on them.

These are very different use-cases, and the solutions are very different as well.

(I'm leaving this open as a reminder to improve the docs in this regard.)

sanjeevpe commented 2 years ago

Thank you for your reply; it is helpful.