xdrop / fuzzywuzzy

Java fuzzy string matching implementation of the well known Python's fuzzywuzzy algorithm. Fuzzy search for Java
GNU General Public License v2.0
822 stars 118 forks source link

Incompatibility with the Python version in handling underscores #97

Closed DoronRippel closed 2 years ago

DoronRippel commented 2 years ago

The FuzzySearch.tokenSetPartialRatio() method returns different results than the Python version for strings that contain underscore.

Examples:

xdrop commented 2 years ago

Thanks, I have pushed a commit to fix this. This will be fixed in 1.3.4

DoronRippel commented 2 years ago

Great!

On Tue, Jan 18, 2022, 20:10 Panayiotis @.***> wrote:

Thanks, I have pushed a commit to fix this. I'll release this in a new version.

— Reply to this email directly, view it on GitHub https://github.com/xdrop/fuzzywuzzy/issues/97#issuecomment-1015680036, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIFAZH4HGLJGPEYZZFOLWX3UWWUJTANCNFSM5MGKESNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

Click here https://www.mailcontrol.com/sr/X64e8mDbA17GX2PQPOmvUrikelaf-VTnN0AhUdFzSBrPGv9reHPJB2zfum3JrqdEpXQPnJCKWWNqvfHxypKb0g== to report this email as spam.

DoronRippel commented 2 years ago

Hi Panayiotis,

I saw that there was a new version 1.3.4, so I assumed that this is where you fixed the issue and I used it, but the issue is not only not fixed - all the examples now return 100...

Here is how I run them in Java: System.out.println("expected 58 -> got " + FuzzySearch.tokenSetPartialRatio("worm_mikeala", "mikeala rath")); System.out.println("expected 80 -> got " + FuzzySearch.tokenSetPartialRatio("c_wasyluka", "crystal wasyluka")); System.out.println( "expected 78 -> got " + FuzzySearch.tokenSetPartialRatio("a_bacdefg", "crystal bacdefg")); I get: expected 58 -> got 100 expected 80 -> got 100 expected 78 -> got 100

and here is how I run them in Python:

from fuzzywuzzy import fuzz

if name == 'main':

print(fuzz.partial_token_set_ratio("worm_mikeala", "mikeala rath"))
print(fuzz.partial_token_set_ratio("c_wasyluka", "crystal wasyluka"))
print(fuzz.partial_token_set_ratio("x_bacdefg", "crystal bacdefg"))

I get: 58 80 78

Thank you,

Doron

On Tue, Jan 18, 2022 at 8:10 PM Panayiotis @.***> wrote:

Thanks, I have pushed a commit to fix this. I'll release this in a new version.

— Reply to this email directly, view it on GitHub https://github.com/xdrop/fuzzywuzzy/issues/97#issuecomment-1015680036, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIFAZH4HGLJGPEYZZFOLWX3UWWUJTANCNFSM5MGKESNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

Click here https://www.mailcontrol.com/sr/X64e8mDbA17GX2PQPOmvUrikelaf-VTnN0AhUdFzSBrPGv9reHPJB2zfum3JrqdEpXQPnJCKWWNqvfHxypKb0g== to report this email as spam.