seatgeek / fuzzywuzzy

Fuzzy String Matching in Python
http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/
GNU General Public License v2.0
9.23k stars 875 forks source link

Issue with scores. #19

Closed altruist123 closed 11 years ago

altruist123 commented 11 years ago

Hi,I do the following using fuzzy wuzzy,

choices = ["BestBuy", "ebay", "overstock", "rakuten","sears"] match1 = process.extractOne("ebay - asdlfjlksj ", choices) match2 = process.extractOne("thebay - asdlfjlksj ", choices)

if I print match1 and match2 I get the following.

match1: ('ebay', 90) match2: ('ebay', 90).

As you can see match1 should be a closer match but both have a ratio of 90, is there anyway to use to use this library to give more weightage to whole words and there by more match ratio to match1 or is this a bug?

Thanks.

acslater00 commented 11 years ago

processor.extractOne takes a kw param for a custom scorer, by default it uses WRatio which may just not be great for your specific application. You can experiment with others. This is not a bug with the library, so I'm going to close the issue.

There are definitely some scoring algorithms (not implemented) that will tokenize a string and then only give 'credit' for a complete token match. If you implemented one I'd happily consider adding it to the library.