Closed psalias2006 closed 4 years ago
Flashtext does not know the greek alphabet.
from flashtext import KeywordProcessor
keyword_processor = KeywordProcessor()
keyword_processor.set_non_word_boundaries(set('πομολο')) # or the entire script if you want
print(keyword_processor.non_word_boundaries)
>> {'μ', 'π', 'ο', 'λ'}
keyword_processor.add_keywords_from_list(['πο'])
keyword_processor.extract_keywords('πο μολο')
>> πο
if you don't put the space between πο
and μολο
extract_keywords() will not yield a result because μ
is not a word boundary. if you want to understand how flashtext works, this will get you started.
Flashtext does not know the greek alphabet.
from flashtext import KeywordProcessor keyword_processor = KeywordProcessor() keyword_processor.set_non_word_boundaries(set('πομολο')) # or the entire script if you want print(keyword_processor.non_word_boundaries) >> {'μ', 'π', 'ο', 'λ'} keyword_processor.add_keywords_from_list(['πο']) keyword_processor.extract_keywords('πο μολο') >> πο
if you don't put the space between
πο
andμολο
extract_keywords() will not yield a result becauseμ
is not a word boundary. if you want to understand how flashtext works, this will get you started.
Thanks for the help! I'll give it a try
Based on @iwpnd post I have this workaround for greek words
import string
from flashtext import KeywordProcessor
greek_simple = 'ΑαΒβΓγΔδΕεΖζΗηΘθΙιΚκΛλΜμΝνΞξΟοΠπΡρΣσςΤτΥυΦφΧχΨψΩω'
greek_accent = 'άέήίόύώΆΈΉΊΌΎΏ'
non_word_boundaries = set(string.digits + string.ascii_letters + greek_simple + greek_accent + '_')
keyword_processor = KeywordProcessor()
keyword_processor.set_non_word_boundaries(non_word_boundaries)
thanks again!
Hi I think is not working as supposed to with Greek language