sul-dlss-deprecated / dor_indexing_app

An indexing API for Stanford's Digital Object Repository
https://sul-dlss-deprecated.github.io/dor_indexing_app/
Apache License 2.0
0 stars 2 forks source link

populate catch-all field with all useful human readable text in cocina descriptive #1029

Closed ndushay closed 11 months ago

ndushay commented 11 months ago

We need to add a new field to our Solr documents, all_text_timv, that is a "catch all" field for any cocina descriptive text that users might search on.

This should include the values (not the keys) from our cocina descriptive JSON:

and of course, recurse through structuredValues, parallelValues and groupedValues.

do NOT recurse through or include

Do this for these fields in the Cocina:

See https://github.com/sul-dlss/dor-services-app/issues/4522 for more info ... but we don't want to include "uri" or "valueAt" or "code" (which is also a URI)

ndushay commented 11 months ago

NOTE: I have changes in flight for dor_indexing_app, so please let me know if this triggers serious refactoring.

ndushay commented 11 months ago

fields:

      # all the descriptive data that we want to search on, with different flavors for better recall and precision
      'descriptive_tiv' => all_search_text, # ICU tokenized, ICU folded
      'descriptive_text_nostem_i' => all_search_text, # whitespace tokenized, ICU folded, word delimited
      'descriptive_teiv' => all_search_text # ICU tokenized, ICU folded, minimal stemming

NEW fields

image image image

HOW MANY should there be?

Number ... registered?

image

Number ... accessioned?

image

Number with ids?

image
ndushay commented 11 months ago

Andrew says: just go ahead with this one. Not sure what to check with qttest.