Open ZihangH opened 3 years ago
Overall I am personally not really convinced, this should be added at all for two reasons 1) it adds more arguments, which makes it increasingly hard to use the function in the correct way 2) In my opinion this does not belong into process.*. When using token_set_ratio these results are indeed all as similar. So taking the first is a well defined behaviour. When the user wants to prefer matches that have e.g. many characters in common he should use a different scorer, that combines the result of multiple string metrics. A good example for this is fuzz.WRatio, that is implemented as a separate scorer, that combines multiple metrics.
This pull request addresses the problem in this issue: https://github.com/seatgeek/fuzzywuzzy/issues/280
All the code changes + unit tests are in process.py and test_fuzzywuzzy.py.
All the old and new test cases in test_fuzzywuzzy.py are passed.