hankcs / HanLP

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
https://hanlp.hankcs.com/
Apache License 2.0
33.84k stars 10.12k forks source link

java DynamicCustomDictionary load 词典时候不生效 #1782

Closed duanfa closed 2 years ago

duanfa commented 2 years ago

Describe the bug DynamicCustomDictionary

Code to reproduce the issue dictionary.load("/home/duanfa/trash/845.txt"); DynamicCustomDictionary load 词典时候不生效 845.txt 文件内容

爱慕官方花园速写内衣女无钢圈中厚杯小胸聚拢蕾丝文胸AM171791 nz 1 爱慕官方花 nz 1

用 insert 生效 dictionary.insert("爱慕官方花园速写内衣女无钢圈中厚杯小胸聚拢蕾丝文胸AM171791", "N 1");

或者在hanlp.properties里面配置 CustomDictionaryPath=data/dictionary/custom/duanfa/845.txt 生效后,把 data/dictionary/custom/duanfa/845.txt.bin 拷贝到 /home/duanfa/trash/ 目录下,dictionary.load("/home/duanfa/trash/845.txt");就可以生效了

版本

com.hankcs hanlp portable-1.8.3
import com.hankcs.hanlp.dictionary.DynamicCustomDictionary;
import com.hankcs.hanlp.seg.Segment;
import com.hankcs.hanlp.seg.common.Term;

public class LoadCustomFile {
    public DynamicCustomDictionary dictionary = new DynamicCustomDictionary();
    public Segment hanlpSegmenter;

    public LoadCustomFile() {
        try {
            hanlpSegmenter = HanLP.newSegment();
            hanlpSegmenter.enableCustomDictionary(dictionary).enableCustomDictionaryForcing(true);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        String text = "爱慕官方花园速写内衣女无钢圈中厚杯小胸聚拢蕾丝文胸AM171791";
        LoadCustomFile lcf = new LoadCustomFile();
        lcf.dictionary.load("/home/duanfa/trash/845.txt");
        for (Term term : lcf.hanlpSegmenter.seg(text)) {
            System.out.println(term);
        }
    }
}

代码打印结果: 爱慕/vn 官方/n 花园/n 速写/n 内衣/n 女/b 无/v 钢圈/n 中/f 厚/a 杯/q 小/a 胸/ng 聚拢/v 蕾/ng 丝/q 文/ng 胸/ng AM/nx 171791/m

Describe the current behavior A clear and concise description of what happened.

Expected behavior A clear and concise description of what you expected to happen.

System information

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

duanfa commented 2 years ago

先把动态insert的新词保存成bin文件,然后下次load就可以了

Huyueeer commented 2 years ago

@duanfa 请问一下您,保存成bin是指比方说xxx.txt变成xxx.txt.bin吗?

duanfa commented 1 year ago

@duanfa 请问一下您,保存成bin是指比方说xxx.txt变成xxx.txt.bin吗?

save的时候就会自己保存成.bin后缀的文件