Closed GoogleCodeExporter closed 8 years ago
public static void wordSegmentation(String source) throws IOException {
TokenStream tokenStream = analyzer.tokenStream("tag", new StringReader(
source));
OffsetAttribute offsetAttribute = tokenStream
.getAttribute(OffsetAttribute.class);
CharTermAttribute charTermAttribute = tokenStream
.getAttribute(CharTermAttribute.class);
// KeywordAttribute keywordAttribute =
// tokenStream.getAttribute(KeywordAttribute.class);
while (tokenStream.incrementToken()) {
int startOffset = offsetAttribute.startOffset();
int endOffset = offsetAttribute.endOffset();
String term = charTermAttribute.toString();
logger.info(term);
// logger.info(keywordAttribute.isKeyword());
// logger.info("valid str " + source.replaceAll(term, "*"));
logger.info(startOffset + " : " + endOffset);
}
logger.info("filterd abc:" + source);
}
Original comment by loujan...@gmail.com
on 16 Mar 2012 at 2:32
你好,建立可以用下lukeall-3.5.0.jar工具来查看索引效果
Original comment by kingcs2008@gmail.com
on 17 Jul 2012 at 8:16
/**
* 分词效果
* @param content
* @throws IOException
*/
public void testAnalyzer(String content) throws IOException{
analyzer = new IKAnalyzer();
TokenStream tokenStream = analyzer.tokenStream("txt", new StringReader(content));
tokenStream.addAttribute(CharTermAttribute.class);
while (tokenStream.incrementToken()) {
CharTermAttribute charTermAttribute = tokenStream
.getAttribute(CharTermAttribute.class);
System.out.print(charTermAttribute.toString() + " | ");
}
}
testIKAnalyzer.testAnalyzer("北京政采科技有限公司");
北京 | 政采 | 科技 | 有限公司 | 有限 | 公司 |
Original comment by kingcs2008@gmail.com
on 23 Jul 2012 at 4:59
Original comment by linliang...@gmail.com
on 23 Oct 2012 at 9:34
Original issue reported on code.google.com by
lydialmr...@gmail.com
on 25 Oct 2011 at 9:15