String fuzzy-matching From R to Python

There are a couple of important differences between the two packages: 1) In FuzzyWuzzy limit specifies how many elements you want extract to return. extract does not provide an argument to specify a maxDist. For this purpose you would have to use the extractBests with the score_cutoff argument.

2) Stringdist appears to use an edit distance, while FuzzyWuzzy only provides normalized string metrics (0-100). So you would have to use e.g. score_cutoff=90. You can specify the string metric using the scorer argument. 3) FuzzWuzzy preprocesses strings by default in the extract function (lowercase + replaces non alphanumeric characters). You can disable this using processor=None

As an alternative you could use RapidFuzz which allows the usage of edit distances and a score_cutoff parameter in the extract function:

>>> from rapidfuzz import process, string_metric
>>> process.extract("PARI", ["HELLO", "WORLD"], processor=None, scorer=string_metric.levenshtein, score_cutoff=2)
[]
>>> process.extract("HELL", ["HELLO", "WORLD"], processor=None, scorer=string_metric.levenshtein, score_cutoff=2)
[('HELLO', 1, 0)]

seatgeek / fuzzywuzzy

String fuzzy-matching From R to Python #317