psu-libraries / psulib_traject

Penn State University Libraries' Blacklight Catalog Traject Indexer
Apache License 2.0
2 stars 0 forks source link

Research: Resolving Title Browse With Stop Words #491

Open ruthtillman opened 9 months ago

ruthtillman commented 9 months ago

(This affects both Blacklight and Traject, but since indexing is on the Traject side, I thought I'd put it here)

We have a partly implemented Title Browse which ran into an issue with the indexing and coding of stop words. The case we need to handle is that even though we may say not to type A/An/The into it, it may be hard for a user to avoid doing so when it's an integral part of the title.

Easy to ignore: Politics of Mass Digitization

Hard to ignore: An American Marriage

Initial Questions

  1. What would it take to redo title index / search with a set of common stop words? Because of ordering/alphabetization, I'd anticipate having to create the index with a set of stop words and strip the same off searches in Title Browse?
  2. How easy/hard would it be to display the full title in the link vs. the stripped title?
ajkiessl commented 9 months ago

https://github.com/psu-libraries/psulib_blacklight/pull/1254 Fixes # 1 in the OP.

Still looking for a potential solution to # 2. This is difficult because the title browse implements a facet search shared with other browse features. Facet searches return the matched value and the number of documents it matches with. It does not return anything else from the documents making it hard to figure out what the full title is of the items we are matching.

ajkiessl commented 9 months ago

Reopening since the second part is not done yet.