seatgeek / fuzzywuzzy

Fuzzy String Matching in Python
http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/
GNU General Public License v2.0
9.2k stars 878 forks source link

How to decrease False positive matches? (process.extract / WRatio) #328

Open Pranav082001 opened 2 years ago

Pranav082001 commented 2 years ago

I am using process.extract method, And I know it uses WRatio under the hood for calculating score. Following is the case in which I am getting very high score of 90 despite the string hardly equal. Is there any way to fix this in WRatio?

inp_name="america"

name_list=["american Futures and Options Exchange"]

process.extractOne(inp_name,name_list)

Output--> ('american Futures and Options Exchange', 90.0, 0)

PS: I know other alternatives likes fuzz.ratio, partial_ratio, token_sort_ratio. But WRatio works pretty well for my usecase. So any workaround for the same would be appreciated... Thanks!

maxbachmann commented 2 years ago

Maybe write your own version of WRatio, which does not fall back to the partial version of the algorithms.

Pranav082001 commented 2 years ago

Could you please help me. Do I need to set try_partial parameter False in def WRatio? https://github.com/seatgeek/fuzzywuzzy/blob/af443f918eebbccff840b86fa606ac150563f466/fuzzywuzzy/fuzz.py#L272

maxbachmann commented 2 years ago

Yes thats what I would try