liuchjlu / fudannlp

Automatically exported from code.google.com/p/fudannlp
0 stars 0 forks source link

1.05版本分词器分词bug #24

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
发现1.05版本的分词器对于标点和英文单词的分词不是特别好

        tag = new CWSTagger("./models/seg.c7.110918.gz",         "./models/dict.txt");
        System.out.println("\n使用词典");
        str = "今天的#NEXT WAVE#新星是一位“天之骄子”";
        s = tag.tag(str);
        System.out.println(s);
今天的#NEXT WAVE#新星是一位“天之骄子”
会把#NEXT WAVE#分成#NEXT/WAVE#

今天的NEXT WAVE新星是一位“天之骄子”
会把NEXT WAVE分成NEXTWAVE

自定义词典中并无这些单词,请问分词是否仍有特殊配置?

Original issue reported on code.google.com by hgs19861...@sina.com on 30 May 2012 at 6:33