huaban / jieba-analysis

结巴分词(java版)
https://github.com/huaban/jieba-analysis
Apache License 2.0
2.55k stars 835 forks source link

JiebaSegmenter 144行代码BUG #93

Open zhaoxing624 opened 5 years ago

zhaoxing624 commented 5 years ago
if (wordDict.containsWord(paragraph.substring(i, i + 1))) {
    tokens.add(new SegToken(paragraph.substring(i, i + 1), offset, ++offset));
} else {
    tokens.add(new SegToken(paragraph.substring(i, i + 1), offset, ++offset));
}

这块代码是逻辑有问题吧,2个分支是一样的逻辑

catBigcat commented 5 years ago

这个我也忘了他怎么写的了,本质上是构造有向图,所以,你想想图构造的对不对。重复并不一定是错的。