sul-dlss-deprecated / dor_indexing_app

An indexing API for Stanford's Digital Object Repository
https://sul-dlss-deprecated.github.io/dor_indexing_app/
Apache License 2.0
0 stars 2 forks source link

improve title indexing #1044

Closed ndushay closed 8 months ago

ndushay commented 11 months ago

I think we may want something like these types of titles

analyzed as "exactish", "unstemmed", "stemmed"

(We currently only seem to be indexing a "display title" as stemmed for searching.)


from sul-dlss/sul-solr-configs/blob/master/searchworks-prod/solrconfig.xml:

      <str name="qf">
        title_245a_exact_search^1000
        title_245a_unstem_search^500
        title_245a_search^75           vern_title_245a_search^75
        title_245_unstem_search^75
        title_245_search^50            vern_title_245_search^50
        title_uniform_unstem_search^50
        title_uniform_search^20        vern_title_uniform_search^20
        title_variant_unstem_search^20
        title_variant_search^15        vern_title_variant_search^15
        title_related_unstem_search^15
        title_related_search^10        vern_title_related_search^10

(from sul-dlss/cocina-models/issues/653)

To distinguish between the short main title, 245a, and the full main title, I got this spec from Arcadia:

"245a = title.structuredValue.value with type 'main title' (or just title.value), 245b = title.structuredValue.value with type 'subtitle'; if there are multiple titles, the 245 should have status 'primary' or at least be first"


I propose we use Solr title fields along these lines:

main_title_exact (new main_title from cocina titlebuilder main_title_unstemmed main_title stemmed

full_title_unstemmed. (strategy "first" for cocina titlebuilder?) full_title stemmed

other_titles_unstemmed (strategy "all" for cocina titlebuilder, less the overlap with "first"?) other_titles stemmed

ndushay commented 9 months ago

See sul-dlss/cocina-models#657 -- some questions for Arcadia before proceeding further.

ndushay commented 8 months ago

closing this in favor of 1075