Open boechat107 opened 4 years ago
I posted a similar ticket for this same issue #272.
Contrary to your solution I would propose to more smartly sort the tokens - e.g. by counting matching ngrams between the tokens of both strings - before applying fuzz.ratio
.
I fear that calculating the full ratio for each possible permutation will explode computational time, especially when performing this function on a large set of examples, e.g. in database matching.
Uhum, I see. I think you do have a point, @shbunder.
My motivation to suggest this feature comes from this example:
At least in my context, it is pretty clear that both strings are very very similar, but Python
sorted
function sorts tokens in an "undesired" way:It might be interesting to add a function that calculates the match for all possible permutations of tokens:
Does it make sense for someone else?
PS.:
fuzzywuzzy
is a great library!