vi3k6i5 / flashtext

Extract Keywords from sentence or Replace keywords in sentences.
MIT License
5.57k stars 598 forks source link

“成都”the two chinese words won't recognize #132

Open GuoPL opened 2 years ago

GuoPL commented 2 years ago

from flashtext import KeywordProcessor

text = "@苍月轶 再次核实:骆然5月8日持24小时核酸从宜昌回蓉,到成都24小时内核酸一次,9号回泸定,24小时内又做一次核酸,均阴性,健康码绿码。宜昌不是

AB区域。" text = "成都到北京高铁3小时,郑州到成都2小时"

print(text) kp = KeywordProcessor() kp.add_keyword("到成都", ("成都", "ab")) kp.add_keyword("宜昌", ("宜昌", "ab"))

print(len(kp)) print(kp) word_index = kp.extract_keywords(text, span_info=True) print(word_index) for item in word_index: print(text[item[1]:item[2]])

print('finished')

githublyff commented 1 year ago

from flashtext import KeywordProcessor

text = "成都到北京高铁3小时,郑州到成都2小时" kp = KeywordProcessor() kp.add_keyword("到成都", ("成都", "ab")) kp.add_keyword("宜昌", ("宜昌", "ab"))

print(len(kp)) keywords_found = kp.extract_keywords(text, span_info=True) for item in keywords_found: print(item)

2 (('成都', 'ab'), 13, 15)

Reference:https://blog.csdn.net/chen10314/article/details/122048726

zhangbo2008 commented 1 year ago

still not a good solution cause so many special char will appear in our keywords. like () [] ... etc.