sul-dlss-deprecated / dor_indexing_app

An indexing API for Stanford's Digital Object Repository
https://sul-dlss-deprecated.github.io/dor_indexing_app/
Apache License 2.0
0 stars 2 forks source link

improve relevancy with exactish, tokenized, unstemmed and stemmed field flavors where appropriate #1032

Open ndushay opened 1 year ago

ndushay commented 1 year ago

Searchworks has this approach for english language fields:

        title_245a_exact_search^1000
        title_245a_unstem_search^500
        title_245a_search^75        

for our natural english language fields, such as

We need to

ndushay commented 1 year ago
  1. use a more aggressive english stemmer - porter snowball, like searchworks?
image
  1. exact-ish matching (precision: exactish, then non-stemmed, then stemmed)

  2. text_ws may not be used anywhere???

ndushay commented 1 year ago

Questions:

which tokenizer?

remotely reasonable candidates

which stemmer?

which sort field type / filters

which case folding?

what other filters?

NOT:

ndushay commented 1 year ago

FieldTypes:

https://solr.apache.org/guide/8_11/field-types-included-with-solr.html

Deprecated Field Types

"All Trie numeric and date field types have been deprecated in favor of Point field types. Point field types are better at range queries (speed, memory, disk), however simple field:value queries underperform relative to Trie. Either accept this, or continue to use Trie fields. This shortcoming may be addressed in a future release. " - https://solr.apache.org/guide/8_11/field-types-included-with-solr.html

ndushay commented 1 year ago

https://solr.apache.org/guide/8_11/field-properties-by-use-case.html

ndushay commented 1 year ago

UUID field type don't need it

Sort fields: ICUCollation field type? "SortableTextField"? "TextField"? Argo only sorts results by druid or by relevance.

docValues for faceting, sorting, highlighting; NOT for searching.

Trie fields are deprecated

ndushay commented 11 months ago

closing this in favor of existing tickets; the new fields have been set up in schema.xml