huaban / jieba-analysis

结巴分词(java版)
https://github.com/huaban/jieba-analysis
Apache License 2.0
2.55k stars 835 forks source link

连续书写标点的时候,分词没有识别出停用词 #102

Closed WaylonSong closed 4 years ago

WaylonSong commented 4 years ago
String content = "(通过文件传输签署交易,类似商业合同电子签名,,,,Ricardian Contract的";
int topN = 10;
WordDictionary dictionary = WordDictionary.getInstance();
JiebaSegmenter segmenter = new JiebaSegmenter(dictionary);
List<String> segments = segmenter.sentenceProcess(content);
System.out.println(segments);

结果为

[(, 通过, 文件传输, 签署, 交易, ,, 类似, 商业, 合同, 电子签名, ,,,,, Ricardian,  , Contract, 的]