ryanqq / paoding

Automatically exported from code.google.com/p/paoding
0 stars 0 forks source link

分词问题,有可能是我用的测试代码的问题,可重现问题但非普遍性问题 #54

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Paoding paoding = PaodingMaker.make();
2. Analyzer writeAnalyzer = PaodingAnalyzer.writeMode(paoding);
3. ...
4. To be index string '碳酸钠碳酸钠碳酸'
5. 用TokenStream & TerAttribute打印出来是

碳酸
酸钠
碳酸
碳酸
酸钠

6. 我查过用dict中,已经有 
'碳酸钠'这个词,但不明白为什么分词时,paoding不懂得拆出三字
词,但我用其它的词如'碰碰车',它懂得拆成'碰碰' & '碰碰车'.

Lucene 3.0
Paoding 2.0.04 stable + 3个Java Path 
(http://code.google.com/p/paoding/issues/detail?id=49)

Original issue reported on code.google.com by mafa...@gmail.com on 11 Jan 2010 at 8:48