hankcs / hanlp-lucene-plugin

HanLP中文分词Lucene插件,支持包括Solr在内的基于Lucene的系统
http://www.hankcs.com/nlp/segment/full-text-retrieval-solr-integrated-hanlp-chinese-word-segmentation.html
Apache License 2.0
296 stars 99 forks source link

新手请教查询问题 #14

Closed barrycheng closed 7 years ago

barrycheng commented 7 years ago

版本: Solr 6.2

Schema.xml:

<fieldType name="text_cn" class="solr.TextField">
      <analyzer type="index">
        <tokenizer class="com.hankcs.lucene.HanLPTokenizerFactory" enableIndexMode="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="com.hankcs.lucene.HanLPTokenizerFactory" enableIndexMode="false"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
</fieldType>

Analyse: default

问题: 索引中 老人头男皮带(505J014L)包邮 被分为 老人头 | 老人 | 人头 ,而我输入的老人头 被分为 老 | 人头,导致搜索不到结果,请问这样的原因是什么?如何处理?

hankcs commented 7 years ago
  1. 你开启了index模式
  2. 既然都有人头,就可以搜索到,你说的搜不到根本不成立:
Analyzer analyzer = new HanLPIndexAnalyzer();////////////////////////////////////////////////////
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
        Directory directory = new RAMDirectory();
        IndexWriter indexWriter = new IndexWriter(directory, config);

        Document document = new Document();
        document.add(new TextField("content", "[新闻]服务大众。", Field.Store.YES));
        indexWriter.addDocument(document);

        document = new Document();
        document.add(new TextField("content", "[经济学]商品和服务", Field.Store.YES));
        indexWriter.addDocument(document);

        document = new Document();
        document.add(new TextField("content", "[服装店]老人头男皮带", Field.Store.YES));
        indexWriter.addDocument(document);

        indexWriter.commit();
        indexWriter.close();

        IndexReader ireader = DirectoryReader.open(directory);
        IndexSearcher isearcher = new IndexSearcher(ireader);
        QueryParser parser = new QueryParser("content", new HanLPAnalyzer());
        Query query = parser.parse("老人头");
        ScoreDoc[] hits = isearcher.search(query, 300000).scoreDocs;
        assertEquals(1, hits.length);
        for (ScoreDoc scoreDoc : hits)
        {
            Document targetDoc = isearcher.doc(scoreDoc.doc);
            System.out.println(targetDoc.getField("content").stringValue());
        }

[服装店]老人头男皮带