wellcomecollection / catalogue-api

:crystal_ball: The API for searching the Wellcome Collection catalogue.
https://developers.wellcomecollection.org
MIT License
4 stars 0 forks source link

Update WorksQuery.json to include shelfmark #807

Closed kenoir closed 2 months ago

kenoir commented 2 months ago

What does this change?

This change updates search queries to include the shelfmark value added by https://github.com/wellcomecollection/catalogue-pipeline/pull/2693.

We add a new block to the works query:

      {
        "multi_match": {
          "_name": "ids_with_path_lax",
          "query": "{{query}}",
          "analyzer": "lowercase_whitespace_tokens",
          "fields": ["query.items.shelfmark*", "query.collectionPath*"],
          "type": "cross_fields",
          "boost": 50,
          "operator": "OR",
          "minimum_should_match": 1
        }
      },

Here matches on multipart IDs (shelfmark & collection path) are boosted less than other ID matches because the nature of these identifiers is that small parts of them may collide with valid searches (for example "of", "wa", "sa") pushing matching results for this ID parts above other results in a way that may confuse users.

This change was revealed by, and tested with wellcomecollection/rank: https://github.com/wellcomecollection/rank/pull/111/files

[!Note] This PR changes the way collection paths are searched as they share the same issue as shelf marks described above.

Part of https://github.com/wellcomecollection/catalogue-pipeline/issues/2453

Checklist

How to test

How can we measure success?

Collection staff are more easily able to find the works they are looking for.

Have we considered potential risks?

We should have rank tests to ensure this does not change search query results in unexpected and detrimentatl ways before merging to mititigate risk.