Closed MatMoore closed 3 days ago
Slack thread about stemming and synonyms support https://datahubspace.slack.com/archives/CV2UVAPPG/p1693836742497149
This is not actually intended to be customisable by datahub though (slack thread), so it is not part of the custom search_config.yaml
we just added. The only way to do it is to rebuild the metadata-io
module with an edited synonyms file.
The other approach is to apply the synonyms at search time, as described elasticsearch synonyms guide but Datahub do not currently support this at all.
Decided not to do the workaround as it's not a supported way of using Datahub, and would complicate deployment (slack thread)
I've reached out to Datahub about the possibility of making this externally configurable, so if that gets added we could revisit this in a future sprint.
User Story
As a lover of acronyms (LoA) I want to be able to search for my TLAs Rather than spell out every single word (SOESW)
Value / Purpose
People refer to things in different ways, and in general I think we can expect data consumers to use different terminology to the people writing the metadata. Users do not expect to have to adapt their language use just for a search engine.
An example: while working on the CJS dashboard, I noticed that it's not actually findable by searching for CJS, because I didn't put the acronym in the description. This meant I needed to spell out "Criminal Justice System".
We don't have full control over the titles & descriptions, but we should be able to configure the search engine to understand common jargon.
Useful Contacts
No response
User Types
No response
Hypothesis
If we set up synonyms based on common acronyms Then click through rates will go up
Proposal
Take the list from https://github.com/ministryofjustice/acronyms and configure OpenSearch to treat the acronym and expanded form as synonyms.
Additional Information
No response
Definition of Done