Open haifenghuang opened 6 years ago
Sorry, It didn't support Chinese for now on.. English use spaces to separate words, but not Chinese. I did consider to add this feature in this repo, but at last I thought it will be better to build a new tool to extract Chinese sentences.
As Title suggested, Below code:
keyword_processor := NewKeywordProcessor() keyword_processor.AddKeywords("欢迎") keyword_processor.AddKeywords("来") keyword_processor.AddKeywords("北京") result := keyword_processor.ExtractKeywords("欢迎来北京") for _, v := range result { e := ExtractResult(v) fmt.Printf("return : %s\n", e.Keyword) }
There is nothing in the output, because
len(result) = 0
.If we change above keywords to english:
keyword_processor := NewKeywordProcessor() keyword_processor.AddKeywords("welcome") keyword_processor.AddKeywords("to") keyword_processor.AddKeywords("beijing") result := keyword_processor.ExtractKeywords("welcome to beijing") for _, v := range result { e := ExtractResult(v) fmt.Printf("return : %s\n", e.Keyword) }
The result is:
return : welcome return : to return : beijing
Hi hiafenghuang I did a similar job in recent work about flashtext with Chinese support.
keywordProcessor := gf.NewKeywordProcessor()
keywordProcessor.AddKeyword("欢迎")
keywordProcessor.AddKeyword("来")
keywordProcessor.AddKeyword("北京")
result := keywordProcessor.ExtractKeywords("欢迎来北京")
for _, v := range result {
fmt.Printf("return : %s\n", v)
}
And the result is
return : 欢迎
return : 来
return : 北京
The package is here.
Besides, I used PyFlashtext which is also with similar Chinese problems and I fixed it. To improve the performance in my product env, I rewrite FlashText algorithm with go instead of python. And it works well. Welcome to use go-flashtext.
As Title suggested, Below code:
There is nothing in the output, because
len(result) = 0
.If we change above keywords to english:
The result is: