vi3k6i5 / flashtext

Extract Keywords from sentence or Replace keywords in sentences.
MIT License
5.58k stars 598 forks source link

Can I use stemmed version of keyphrases to extract them? #67

Open renatofcorrea opened 5 years ago

renatofcorrea commented 5 years ago

Hello, I have a question: Can I use stemmed version of keyphrases to extract them? Because, sometimes is usefull use stem to capture some equivalent expressions with variations, for example {digital library: digital librar} It pattern will match with: digital library, digital libraries, digitalized library.

remiadon commented 5 years ago

Hi, this issue is about fuzzy matching intergration

For my first PR I would like to sumbit a simple version of fuzzy matching, but if it is accepted, we can move forward and try to integrate custom weights for insertions, deletions, and replacements.

I think this feature, applied with low weights for insertions, would "work" for your kind of problem because adding chars would only slightly increase the levensthein dist while it is computed, and a fuzzy match would still be possible.

ecwootten commented 2 years ago

Unless I am misunderstanding, the fuzzy matching added in PR #84 doesn't seem to work well for this kind of problem....

>>> processor = flashtext.KeywordProcessor()
>>> processor.add_keywords_from_list(['cat', 'dog'])
>>> processor.extract_keywords('fight like cat and dog')
['cat', 'dog']
>>> processor.extract_keywords('raining cats and dogs')
[]
>>> processor.extract_keywords('raining cats and dogs', max_cost=2)
[]
>>> processor.extract_keywords('raining cats and dogs', max_cost=20)
['cat', 'cat']
>>> processor.extract_keywords('raining cats and dogs', max_cost=200)
['cat', 'cat']
>>> processor.extract_keywords('raining frogs and dogs', max_cost=200)
['cat', 'cat', 'cat']