johnjung / bmrcportal

GNU General Public License v3.0
1 stars 1 forks source link

Enrich data results by "forgiving" periods at end of cataloguing terms? #9

Closed MomoMoses closed 3 years ago

MomoMoses commented 3 years ago

When browsing topics, organizations, and possibly other terms, terms that are identical except for a trailing period are considered unique terms. Can the system ignore this variance in cataloguing practice to "see" them as one term for purposes of counting hits and displaying the lists? Example: YWCA of Metropolitan Chicago. (1) YWCA of Metropolitan Chicago (2)

Ideally these would be combined and show (3) hits; my preference would be to display without the period. Alternatively, we might consider cleaning up the data to remove periods from entries.

johnjung commented 3 years ago

My preference is to clean things like this up in the XML itself, if possible, since improvements there will affect any system that uses these finding aids, and not just the portal. (The finding aids will also probably outlive any system they're being displayed in, so hopefully improvements to them are a better investment than improvements to any website code.)

If we do decide to fix this in the portal code, there are at least two issues to deal with. First, some facets, like "Washington, D.C.", do contain trailing periods- so any code that tries to strip out punctuation needs a list of terms like that to avoid.

The second issue relates to the links from these terms in finding aids to new searches. Stripping out punctuation creates situations where a highlighted search term in a finding aid could contain a trailing period, but the facet itself doesn't. If we link those highlighted search terms to search results via facets, then we would need to watch for this, since it would be easy to create links for highlighted terms in finding aids that don't have any results when clicked.

Let me know what you think about this. It's possible to treat this as a code issue, but it gets a little involved. If we choose to spend development time here it will leave less time for improvements in other places.

MomoMoses commented 3 years ago

Agreed to fix in data cleanup efforts rather than in development time. :)