huaban / jieba-analysis

结巴分词(java版)
https://github.com/huaban/jieba-analysis
Apache License 2.0
2.55k stars 835 forks source link

CharacterUtil 方法BUG #92

Open zhaoxing624 opened 5 years ago

zhaoxing624 commented 5 years ago
    /**
     * 全角 to 半角,大写 to 小写
     * 
     * @param input
     *            输入字符
     * @return 转换后的字符
     */
    public static char regularize(char input) {
        if (input == 12288) {
            return 32;
        }
        else if (input > 65280 && input < 65375) {
            return (char) (input - 65248);
        }
        else if (input >= 'A' && input <= 'Z') {
            return (input += 32);
        }
        return input;
    }

哥们,这块全角转半角后,当全角的A-Z,就没办法转小写了

catBigcat commented 5 years ago

WordDictionary加载字典有如下代码(150行左右), private String addWord(String word) { if (null != word && !"".equals(word.trim())) { String key = word.trim().toLowerCase(Locale.getDefault()); _dict.fillSegment(key.toCharArray()); return key; } else return null; }配合这个bug。