vi3k6i5 / flashtext

Extract Keywords from sentence or Replace keywords in sentences.
MIT License
5.57k stars 599 forks source link

Consecutive Replacements with Empty Non Word Boundaries #65

Closed xokocodo closed 5 years ago

xokocodo commented 5 years ago

I have a use case where I need to replace "words" that could exit without spaces in between. I used the set_non_word_boundaries to have them be empty, and it works for the most part. There is a corner case that fails though.

If you have the keyword repeated it will skip the second consecutive instance.

from flashtext import KeywordProcessor replacer = KeywordProcessor(case_sensitive=True) replacer.set_non_word_boundaries(set()) replacer.add_keyword('old', 'new') True replacer.replace_keywords("old old old old") 'new new new new' replacer.replace_keywords("oldoldoldold") 'newoldnewold'

vi3k6i5 commented 5 years ago

The tool is no good if there is no word boundary. All optimizations are designed around the word boundary. Maybe just use standard regex or search in python it will work better. :)