sul-dlss / earthworks

Geospatial discovery application for Stanford University Libraries.
https://earthworks.stanford.edu
Other
21 stars 3 forks source link

Solr Synonyms: Terms with spaces #1110

Closed hudajkhan closed 2 weeks ago

hudajkhan commented 1 month ago

As mentioned in #1109 , synonym lines in the Solr config file may have multi-word terms where the words are separated by a space. For multi-way expansion (i.e. lines with commas where a search for any term on the line means ORing the search with all the other terms on the line), these terms are not parsed correctly. For example, the string "Public institutions, buildings" will lead to a search for "public" OR "institutions" OR "buildings" instead of "public institutions" OR "buildings"

hudajkhan commented 3 weeks ago

The best way to handle multi-term synonyms appears to be to use the SynonymGraphFilterFactory: https://solr.apache.org/guide/solr/latest/indexing-guide/filters.html#synonym-graph-filter . This filter is also currently in use with SearchWorks, vt arclight, etc. although there we can see this is applied at both index and query time, while our default configuration just used query time analysis.

hudajkhan commented 2 weeks ago

Closed by https://github.com/sul-dlss/sul-solr-configs/pull/342

hudajkhan commented 2 weeks ago

Will close once the changes have been deployed to production as a final step (this is a step that we must take after the GitHub merge so the changes are actually in the Solr collection)

hudajkhan commented 2 weeks ago

Changes have been deployed to production.