sul-dlss-deprecated / dor_indexing_app

An indexing API for Stanford's Digital Object Repository
https://sul-dlss-deprecated.github.io/dor_indexing_app/
Apache License 2.0
0 stars 2 forks source link

Stop using deprecated textNoStem fields - use textUnstemmed instead #1069

Open ndushay opened 11 months ago

ndushay commented 11 months ago

Prereq: integration tests in Argo for appropriate functionality of Solr fields.

The goal of this ticket is to stop using some deprecated solr field types for Argo.

In argo-xx schema.xml:

    <!-- Text tokenized without stemming -->
    <dynamicField name="*_text_unstemmed_i"   type="textUnstemmed" indexed="true"  stored="false" multiValued="false"/>
    <dynamicField name="*_text_unstemmed_im"  type="textUnstemmed" indexed="true"  stored="false" multiValued="true"/>
    <dynamicField name="*_text_unstemmed_si"  type="textUnstemmed" indexed="true"  stored="true"  multiValued="false"/>
    <dynamicField name="*_text_unstemmed_sim" type="textUnstemmed" indexed="true"  stored="true"  multiValued="true"/>
    <!-- DEPRECATED:  textNoStem is a deprecated type -->
    <dynamicField name="*_text_nostem_i"  type="textNoStem" indexed="true"  stored="false" multiValued="false"/>
    <dynamicField name="*_text_nostem_im" type="textNoStem" indexed="true"  stored="false" multiValued="true"/>
...
snip
...0
   <!-- DEPRECATED:  use textUnstemmed, as WordDelimiterFilterFactory is deprecated.  Analyzed Text, no Stemming or Synonyms -->
    <fieldtype name="textNoStem" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.ICUFoldingFilterFactory"/>
        <!-- NFKC, case folding, diacritics removed -->
        <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" generateWordParts="1" catenateWords="1" splitOnNumerics="0" generateNumberParts="1" catenateNumbers="1" catenateAll="0" preserveOriginal="0" stemEnglishPossessive="0"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldtype>

Existing fields using type TextNoStem

NOTE: if a single stored field will do the job, store it (e.g. (primary) author field can be used for display and search if stored)

Steps