ministryofjustice / find-moj-data

Find MOJ data service • This repository is defined and managed in Terraform
https://find-moj-data.service.justice.gov.uk/
MIT License
5 stars 0 forks source link

Search using acronyms #952

Closed MatMoore closed 3 days ago

MatMoore commented 2 weeks ago

User Story

As a lover of acronyms (LoA) I want to be able to search for my TLAs Rather than spell out every single word (SOESW)

Value / Purpose

People refer to things in different ways, and in general I think we can expect data consumers to use different terminology to the people writing the metadata. Users do not expect to have to adapt their language use just for a search engine.

An example: while working on the CJS dashboard, I noticed that it's not actually findable by searching for CJS, because I didn't put the acronym in the description. This meant I needed to spell out "Criminal Justice System".

We don't have full control over the titles & descriptions, but we should be able to configure the search engine to understand common jargon.

Useful Contacts

No response

User Types

No response

Hypothesis

If we set up synonyms based on common acronyms Then click through rates will go up

Proposal

Take the list from https://github.com/ministryofjustice/acronyms and configure OpenSearch to treat the acronym and expanded form as synonyms.

Additional Information

No response

Definition of Done

MatMoore commented 3 days ago

Slack thread about stemming and synonyms support https://datahubspace.slack.com/archives/CV2UVAPPG/p1693836742497149

This is not actually intended to be customisable by datahub though (slack thread), so it is not part of the custom search_config.yaml we just added. The only way to do it is to rebuild the metadata-io module with an edited synonyms file.

The other approach is to apply the synonyms at search time, as described elasticsearch synonyms guide but Datahub do not currently support this at all.

MatMoore commented 3 days ago

Decided not to do the workaround as it's not a supported way of using Datahub, and would complicate deployment (slack thread)

I've reached out to Datahub about the possibility of making this externally configurable, so if that gets added we could revisit this in a future sprint.