I've encountered an issue with FlashText where it does not correctly recognize or match Unicode combined letters. The specific test case involves the letter combination \u0069\u0307, which forms 'i̇' (a dotted i). Despite adding this as a keyword, FlashText fails to find any matches in the text.
Steps to Reproduce:
Import the FlashText library and initiate the KeywordProcessor:
from flashtext import KeywordProcessor
keyword_processor = KeywordProcessor()
Add the combined Unicode character as a keyword:
keyword_processor.add_keyword("\u0069\u0307", "i")
# Alternative attempt: keyword_processor.add_keyword("i̇", "i")
Apply the keyword processor to a sample string containing the character:
Content:
I've encountered an issue with FlashText where it does not correctly recognize or match Unicode combined letters. The specific test case involves the letter combination
\u0069\u0307
, which forms 'i̇' (a dotted i). Despite adding this as a keyword, FlashText fails to find any matches in the text.Steps to Reproduce:
Import the FlashText library and initiate the KeywordProcessor:
Add the combined Unicode character as a keyword:
Apply the keyword processor to a sample string containing the character:
Observe the output:
Expected Behavior:
The keyword processor should recognize the combined Unicode character
\u0069\u0307
(i̇) in the string and match it accordingly.Actual Behavior:
No matches are found, and the output is an empty list
[]
.Additional Information: