Closed ilovefood2 closed 4 years ago
IMHO, flashtext is based on the trie structure of words. That means all strings must be firstly tokenized before searching the keywords. Unfortunately, the case you specified could not be tokenized into words. Hence, FlashText is not the right tool for it. Why not just use regex?
Flashtext is dependent on tokenization and uses non-word-boundaries for that. I just described its functionality over in #97.
So following this train of thought, you would only be able to find "Email", if you'd remove "t" and "r" from KeywordProcessor() .non_word_boundaries
. This would treat "t" and "r" as word-boundaries, comparable to whitespace, comma, exclamation mark etcpp. For obvious reason this would not be beneficial if you would like to use that instance of KeywordProcessor()
for other tasks.
close #96
for example i have a string : "todayIgotEmailreport" how do i get email keyword from this string ??
if i use str.contains('report,False,regex=True)
this will return this string. how can we do it with flashtext?