vi3k6i5 / flashtext

Extract Keywords from sentence or Replace keywords in sentences.
MIT License
5.58k stars 598 forks source link

how to find keyword from a a string like regex does? #96

Closed ilovefood2 closed 4 years ago

ilovefood2 commented 4 years ago

for example i have a string : "todayIgotEmailreport" how do i get email keyword from this string ?? if i use str.contains('report,False,regex=True) this will return this string. how can we do it with flashtext?

wangpeipei90 commented 4 years ago

IMHO, flashtext is based on the trie structure of words. That means all strings must be firstly tokenized before searching the keywords. Unfortunately, the case you specified could not be tokenized into words. Hence, FlashText is not the right tool for it. Why not just use regex?

iwpnd commented 4 years ago

Flashtext is dependent on tokenization and uses non-word-boundaries for that. I just described its functionality over in #97. So following this train of thought, you would only be able to find "Email", if you'd remove "t" and "r" from KeywordProcessor() .non_word_boundaries. This would treat "t" and "r" as word-boundaries, comparable to whitespace, comma, exclamation mark etcpp. For obvious reason this would not be beneficial if you would like to use that instance of KeywordProcessor() for other tasks.

pcakhilnadh commented 4 years ago

close #96