This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm.
Compared with standard FlashText algorithm, there are some differences which make go-flashtext more powerful:
To install GoFlashText package, you need to install Go and set your Go workspace first.
$ go get -u github.com/waltsmith88/go-flashtext
imoprt gf "github.com/waltsmith88/go-flashtext"
package main
import (
"fmt"
gf "github.com/waltsmith88/go-flashtext"
)
func main() {
// add keywords from Map
keywordMap := map[string]string{
"love": "love",
"hello": "hello",
}
keywordProcessor := gf.NewKeywordProcessor()
keywordProcessor.AddKeywordsFromMap(keywordMap)
foundList := keywordProcessor.ExtractKeywords("I love coding.")
fmt.Println(foundList)
}
// [love]
package main
import (
"fmt"
gf "github.com/waltsmith88/go-flashtext"
)
func main() {
// add keywords from Map
keywordMap := map[string]string{
"love": "love",
"中国": "中文",
}
keywordProcessor := gf.NewKeywordProcessor()
keywordProcessor.AddKeywordsFromMap(keywordMap)
keywordProcessor.AddKeyword("love", "ove")
foundList := keywordProcessor.ExtractKeywords("I Love 中国.")
fmt.Println(foundList)
}
// [中文]
package main
import (
"fmt"
gf "github.com/waltsmith88/go-flashtext"
)
func main() {
// add keywords from Map
keywordMap := map[string]string{
"love": "love",
"中国": "中文",
}
keywordProcessor := gf.NewKeywordProcessor()
keywordProcessor.SetCaseSensitive(false)
keywordProcessor.AddKeywordsFromMap(keywordMap)
keywordProcessor.AddKeyword("love", "ove")
foundList := keywordProcessor.ExtractKeywords("I Love 中国.")
fmt.Println(foundList)
}
// [love|ove 中文]
func main() {
// add keywords from Map
keywordMap := map[string]string{
"love": "love",
"中国": "中文",
}
keywordProcessor := gf.NewKeywordProcessor()
keywordProcessor.SetUniqueKeyword(true)
keywordProcessor.SetCaseSensitive(false)
keywordProcessor.AddKeywordsFromMap(keywordMap)
keywordProcessor.AddKeyword("love", "ove")
foundList := keywordProcessor.ExtractKeywords("I Love 中国.")
fmt.Println(foundList)
}
// [ove 中文]
func main() {
// add keywords from Map
keywordMap := map[string]string{
"love": "love",
"中国": "中文",
}
keywordProcessor := gf.NewKeywordProcessor()
keywordProcessor.AddKeywordsFromMap(keywordMap)
sentence := "I love 中国."
cleanNameRes := keywordProcessor.ExtractKeywordsWithSpanInfo(sentence)
sentence1 := []rune(sentence)
for _, resSpan := range cleanNameRes {
fmt.Println(resSpan.CleanName, resSpan.StartPos, resSpan.EndPos, fmt.Sprintf("%c", sentence1[resSpan.StartPos:resSpan.EndPos]))
}
}
// love 2 6 [l o v e]
// 中文 7 9 [中 国]
// way 1: from Map
keywordMap := map[string]string{
"abcd": "abcd",
"student": "stu",
}
keywordProcessor.AddKeywordsFromMap(keywordMap)
// way 2: from Slice
keywordProcessor.AddKeywordsFromList([]string{"student", "abcd", "abc", "中文"})
// way 3: from file. Line: keyword => cleanName
keywordProcessor.AddKeywordsFromFile(filePath)
keywordProcessor.RemoveKeyword("abc")
keywordProcessor.RemoveKeywordFromList([]string{"student", "abcd", "abc", "中文"})
newSentence := keywordProcessor.ReplaceKeywords(sourceSentence)
keywordProcessor.Len()
keywordProcessor.IsContains("abc")
keywordProcessor.GetAllKeywords()
More Examples about Usage in go-flashtext/examples/examples.go and you could have a taste by using following command:
$ go run examples/examples.go
$ git clone github.com/waltsmith88/go-flashtext
$ cd go-flashtext
$ go test -v
It's a custom algorithm based on Aho-Corasick algorithm and Trie Dictionary.
Time taken by FlashText to find terms in comparison to Regex.
Time taken by FlashText to replace terms in comparison to Regex.
Link to code for benchmarking the Find Feature and Replace Feature.
The idea for this library came from the following StackOverflow question.
The original paper published on FlashText algorithm.
@ARTICLE{2017arXiv171100046S,
author = {{Singh}, V.},
title = "{Replace or Retrieve Keywords In Documents at Scale}",
journal = {ArXiv e-prints},
archivePrefix = "arXiv",
eprint = {1711.00046},
primaryClass = "cs.DS",
keywords = {Computer Science - Data Structures and Algorithms},
year = 2017,
month = oct,
adsurl = {http://adsabs.harvard.edu/abs/2017arXiv171100046S},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}
The article published on Medium freeCodeCamp.
The project is licensed under the MIT license.