seatgeek / fuzzywuzzy

Fuzzy String Matching in Python
http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/
GNU General Public License v2.0
9.21k stars 874 forks source link

Extract matched phrase #259

Open spooknik opened 4 years ago

spooknik commented 4 years ago

Would be really handy to be able to return the matched phrase from the extract functions.

For example:


>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]

>>> process.extractMatch("New york Jets are a sportball team.", choices)
        ['New York Jets', 'New york Jets', '91'] 
lutzen101 commented 4 years ago

In my project I use SpaCy to classify named entities in my data. In order to recognize "custom" entities properly I would like to match against a dictionary using fuzzywuzzy. When I get the "matched phrase" back as a matching result, I can use this information to create an entity from it. Now, I have to build custom logic in order to get the matched phrase which is obviously not that efficient.

spooknik commented 4 years ago

Thanks for the reply and explaining your workflow.

For my purposes, I just want a match so I can replace the found phrase with the search phrase. I found that fuzzy-search does exactly what I wanted. It will return the matched phrase in start and end indices which I can extract and use in the replace function.

term = "New York Jets"
text = "New york Jets are a sportball team."

matches = find_near_matches(term, text, max_l_dist=max_distance)
phrase = ([text[m.start:m.end] for m in matches])
print(phrase)
['New york Jets']
...