solr索引的問題 - Githubissues

hankcs / hanlp-lucene-plugin

HanLP中文分词Lucene插件，支持包括Solr在内的基于Lucene的系统

Apache License 2.0

294 stars 99 forks source link

Open benchuang11046 opened 8 years ago

benchuang11046 commented 8 years ago

您好, 在solr做分析索引一段句子 土木水利工程界出身的副市長林陵三指出 分析索引出來為 土木水利工程界出身的副市长林陵三指出 當我要索引水利時，這段句子並不會出現因此我想詢問在分析時如何不把水利工程做合併或是分析出水利工程，讓我能索引到水利

另一個例子是習近平政府也會變成一個nt 有沒有其他模式可以解析習近平政府

謝謝

benchuang11046 commented 8 years ago

另外再請問一個問題 schema.xml的設定

<analyzer  type = "query" >
      <tokenizer  class = "com.hankcs.lucene.HanLPTokenizerFactory"  enableIndexMode = "false" />
</analyzer>

enableIndexMode是指在query時會以索引字詞索引嗎? 如果是true會以習近平及政府做索引 false會以習近平政府做索引

如果我只想要索引連在一起的詞該如何做?

謝謝

hankcs commented 8 years ago

如你所说，IndexMode可以细分“習近平政府”。你可以使用 <analyzer type = "query" >和<analyzer type = "index" > 分别指定查询和索引时使用不同的分词策略。