seatgeek / fuzzywuzzy

Fuzzy String Matching in Python
http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/
GNU General Public License v2.0
9.2k stars 878 forks source link

Implemented sort order matches by common letter count largest to smallest #295

Open ZihangH opened 3 years ago

ZihangH commented 3 years ago

This pull request addresses the problem in this issue: https://github.com/seatgeek/fuzzywuzzy/issues/280

All the code changes + unit tests are in process.py and test_fuzzywuzzy.py.

All the old and new test cases in test_fuzzywuzzy.py are passed.

maxbachmann commented 3 years ago

Overall I am personally not really convinced, this should be added at all for two reasons 1) it adds more arguments, which makes it increasingly hard to use the function in the correct way 2) In my opinion this does not belong into process.*. When using token_set_ratio these results are indeed all as similar. So taking the first is a well defined behaviour. When the user wants to prefer matches that have e.g. many characters in common he should use a different scorer, that combines the result of multiple string metrics. A good example for this is fuzz.WRatio, that is implemented as a separate scorer, that combines multiple metrics.