hankcs / hanlp-lucene-plugin

HanLP中文分词Lucene插件,支持包括Solr在内的基于Lucene的系统
http://www.hankcs.com/nlp/segment/full-text-retrieval-solr-integrated-hanlp-chinese-word-segmentation.html
Apache License 2.0
296 stars 99 forks source link

配置自定义词典无效 #31

Open suyuanhxx opened 6 years ago

suyuanhxx commented 6 years ago

根据说明中的配置方式配置了两种自定义词典方式均没有生成*.bin文件 solr版本为7.1

  1. schema.xml文件中配置customDictionaryPath
    <fieldType name="text_cn" class="solr.TextField">
    <analyzer type="index">
        <tokenizer class="com.hankcs.lucene.HanLPTokenizerFactory" enableIndexMode="true" customDictionaryPath="E:\Develop\solr-7.1.0\server\solr-webapp\webapp\WEB-INF\classes\hanlp\data\dictionary\custom\Organization.txt"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="com.hankcs.lucene.HanLPTokenizerFactory" enableIndexMode="false"/>
    </analyzer>
    </fieldType>

    core所在目录为E:\Develop\solr-7.1.0\server\solr\mycore schema.xmlE:\Develop\solr-7.1.0\server\solr\mycore\conf

  2. hanlp.properties中配置CustomDictionaryPath也没有生效
    root=E:/Develop/solr-7.1.0/server/solr-webapp/webapp/WEB-INF/classes/hanlp/
    CustomDictionaryPath=data/dictionary/custom/CustomDictionary.txt; Organization.txt;

    hanlp.properties所在目录为E:\Develop\solr-7.1.0\server\solr-webapp\webapp\WEB-INF\classes

这两种配置方式在Organization.txt目录下均没有生成.bin,这样子对吗?

suyuanhxx commented 6 years ago

windows下无效,在linux下第一种方式有效

duringall commented 4 years ago

第一种方式是路径错了,\应该是/

duringall commented 4 years ago

第二种方式root路径配置参考:https://github.com/hankcs/HanLP/tree/1.x

image

fishfree commented 1 month ago

@duringall 您说的第二种方式我这边无法生效,在Linux下的。感觉是hanlp-portable.jar压根不读取hanlp.properties里的设置的