sing1ee / analyzer-solr

analyzer adapter for solr 5, we support Jieba, and stranford in the future
MIT License
61 stars 27 forks source link

請問為何出來結果會重覆四筆? #8

Closed vkjuju closed 7 years ago

vkjuju commented 7 years ago

@sing1ee , 請問為何出來結果會重覆四筆? 如下圖, 謝謝 https://drive.google.com/file/d/0B3n0L-fAmNEXY08xWFpsZ0Rzanc/view?usp=sharing

vkjuju commented 7 years ago

@sing1ee , 繁體和簡體都會出現4筆,請幫忙看一下原因, 謝謝

vkjuju commented 7 years ago

@sing1ee , 請問您有空幫忙看一下嗎? 拜託了, 在線等...謝謝

sing1ee commented 7 years ago

@vkjuju 最近比较忙,我找时间看下,应该不会这样的。

vkjuju commented 7 years ago

我的managed-schema如下配置, 麻煩您了:

<fieldType name="text_jieba" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="analyzer.solr5.jieba.JiebaTokenizerFactory"  segMode="SEARCH" userDict="/usr/local/apache-tomcat-9.0.0.M21/webapps/solr/WEB-INF/classes"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="analyzer.solr5.jieba.JiebaTokenizerFactory"  segMode="SEARCH" userDict="/usr/local/apache-tomcat-9.0.0.M21/webapps/solr/WEB-INF/classes"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"/>
  </analyzer>

sing1ee commented 7 years ago

@vkjuju 这个并不是四边,而是展示的过程,你有四个filter的原因。

vkjuju commented 7 years ago

四个filter? 不懂耶, 我裝好後丟一行字上去出來就變成四行了

vkjuju commented 7 years ago

@sing1ee , 我改了一下managed-schema後, 變成只有兩行了, 正常是只有一行嗎? 謝謝

<fieldType name="text_jieba" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="analyzer.solr5.jieba.JiebaTokenizerFactory"  segMode="SEARCH" userDict="/usr/local/apache-tomcat-9.0.0.M21/webapps/solr/WEB-INF/classes"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<!--    <filter class="solr.LowerCaseFilterFactory"/>  -->
<!--    <filter class="solr.SnowballPorterFilterFactory" language="English"/> -->
  </analyzer>
  <analyzer type="query">
    <tokenizer class="analyzer.solr5.jieba.JiebaTokenizerFactory"  segMode="SEARCH" userDict="/usr/local/apache-tomcat-9.0.0.M21/webapps/solr/WEB-INF/classes"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<!--    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> -->
<!--    <filter class="solr.LowerCaseFilterFactory"/> -->
<!--    <filter class="solr.SnowballPorterFilterFactory" language="English"/> -->
  </analyzer>

sing1ee commented 7 years ago

这个不用修改的,不要注释掉。这里显示四行,是正确的。你已经可以用来分词了。