xdrop / fuzzywuzzy

Java fuzzy string matching implementation of the well known Python's fuzzywuzzy algorithm. Fuzzy search for Java
GNU General Public License v2.0
822 stars 118 forks source link

Inconsistent results from extractOne and extractTop #83

Open eswarn24 opened 4 years ago

eswarn24 commented 4 years ago

I could see different results are returned when using methods extractOne and extractTop on the same query string and collections.

I have a pretty long list of collection (15k Strings) to search for each query.

For Instance, let's say I have the following scenario Query - ABC 1721 The collection has following strings in it ABC1721 ABC1721-FGH/L9 ABC MERAKI Z1 EFGD3111/Z1-ABC and many more

extractOne("ABC 1721", collection) gives - ABC1721, Ratio - 95

extractTop("ABC 1721", collection,1) gives - ABC1721, Ratio - 95

but the problem arose when I want the top 5 results extractTop("ABC 1721", collection,5) Match 1 - ABC1721-FGH/L9, Ratio - 86 Match 2 - ABC MERAKI Z1, Ratio - 86 Match 3 - EFGD3111/Z1-ABC, Ratio - 86 and so on

I tried using 'extractSorted' as well, it doesn't give consistent results as extractOne.

I used extractTop (for top 5) and extractOne for 1000+ queries. Around 70% of the 1st Match from extractTop doesn't match with the result of extractOne

BTW, I would like to appreciate your efforts on porting the python logic to Java without any performance lag