Open GoogleCodeExporter opened 9 years ago
发现1.05版本的分词器对于标点和英文单词的分词不是特别好 tag = new CWSTagger("./models/seg.c7.110918.gz", "./models/dict.txt"); System.out.println("\n使用词典"); str = "今天的#NEXT WAVE#新星是一位“天之骄子”"; s = tag.tag(str); System.out.println(s); 今天的#NEXT WAVE#新星是一位“天之骄子” 会把#NEXT WAVE#分成#NEXT/WAVE# 今天的NEXT WAVE新星是一位“天之骄子” 会把NEXT WAVE分成NEXTWAVE 自定义词典中并无这些单词,请问分词是否仍有特殊配置?
Original issue reported on code.google.com by hgs19861...@sina.com on 30 May 2012 at 6:33
hgs19861...@sina.com
Original issue reported on code.google.com by
hgs19861...@sina.com
on 30 May 2012 at 6:33