Open GoogleCodeExporter opened 8 years ago
同步了庖丁源代码,并做了以下修改部分代码见: http://blog.csdn.net/foamflower/archive/2010/07/09/5723361.aspx 测试代码: protected static PaodingAnalyzer analyzer = new PaodingAnalyzer(); protected static StringBuilder sb = new StringBuilder(); protected static String dissect(String input) { try { TokenStream ts = analyzer.tokenStream("", new StringReader(input)); ts.addAttribute(TermAttribute.class); while (ts.incrementToken()){ TermAttribute ta = ts.getAttribute(TermAttribute.class); sb.append(ta.term()); sb.append(" "); } return sb.toString(); } catch (Exception e) { e.printStackTrace(); return "error"; } } /** * @param args */ public static void main(String[] args) { String content = TestAnalyzer.dissect("关于印发《广东电网公司广州供电局“十一五”科技发展计划》的通知"); System.out.println(content); } 分词结果: "关于 印发 广东 电网 公司 广州 供电 供电局 十一五 25 科技 发展 计划 通知" 为何会多出一个25?
Original issue reported on code.google.com by stt...@163.com on 10 Aug 2010 at 6:14
stt...@163.com
3q, 如果的确如此,是为bug
Original comment by qieqie.wang on 10 Aug 2010 at 6:33
qieqie.wang
Original issue reported on code.google.com by
stt...@163.com
on 10 Aug 2010 at 6:14