sul-dlss / vt-arclight

An Arclight-based discovery application for materials from the Virtual Tribunals project
5 stars 2 forks source link

Set highlight method to unified and increase maxAnalyzedChars limit #544

Closed corylown closed 1 year ago

corylown commented 1 year ago

Fixes #487

This sets the highlight method to unified, which is the default in Solr 9, but not in Solr 8 (which we are running in production). The *_tesimv field is already appropriately configured to take advantage of this more efficient highlighting method, see: https://solr.apache.org/guide/solr/latest/query-guide/highlighting.html#schema-options-and-performance-considerations

By setting hl.maxAnalyzedChars to a high value we avoid missing highlight matches where the content of the full_text_tesimv field is longer than the default value of 52100 characters. The largest value in full_text_tesimv is roughly 12 million characters long.

Ideally we'd set hl.maxAnalyzedChars to -1 as a shortcut to set the value to the max integer value, but there is bug in Solr's UnifiedHighlighter that causes an error, see: https://issues.apache.org/jira/browse/SOLR-13121

I reduced the hl.fragsize from the default of 100 because the resulting snippets were longer than with the previous highlight settings.