scientist-softserv / iiif_print

A gem for Hyrax/Samvera for displaying PDF pages in a IIIF Compliant viewer
Apache License 2.0
4 stars 1 forks source link

Add configuration for all_text indexing #228

Closed kirkkwang closed 1 year ago

kirkkwang commented 1 year ago

This commit adds a configuration for the alltext* fields. In Essi, they do not use all_text_tsimv nor all_text_timv so the full text catalog search would not work without it. The default way the full text is being captured in IIIF Print is by reading the txt file that is generated through the TextExtractionDerivativeService. Essi does not use that service but instead has their own implementation of generating the alto xml. We leverage that implementation here and use a lambda to to extract full text from the alto xml.

Story

Essi does not use TextExtractionDerivativeService anymore but they would still needs all_text_t* indexed to perform full text catalog searches.

Refs https://github.com/scientist-softserv/essi/issues/8

Expected Behavior Before Changes

The way alltext* fields get their full text set was not configurable.

Expected Behavior After Changes

The way alltext* fields get their full text set is now configurable.