wellcomecollection / catalogue-api

:crystal_ball: The API for searching the Wellcome Collection catalogue.
https://developers.wellcomecollection.org
MIT License
4 stars 0 forks source link

Don't show unrelated images of veterinary surgery #773

Closed jamieparkinson closed 4 months ago

jamieparkinson commented 5 months ago

This seems to have been caused by accidentally analysing images.identifiers with a text analyzer rather than a keyword analyzer: https://github.com/wellcomecollection/catalogue-pipeline/blob/main/index_config/mappings.works_indexed.2024-01-09.json#L298-L301

The use of a word delimiter token filter in this text analysis means that the Miro IDs of the Royal Veterinary College collection get split into 2 tokens: A0000856 becomes [A, 0000856]. Lots of queries for string literals from the catalogue (as opposed to keyword searches) happen to contain A (as in the single-letter English-language article), and as ID searches are strongly boosted these searches were returning the RVC images.

jcateswellcome commented 5 months ago

Some additional context here: https://wellcome.slack.com/archives/C3TQSF63C/p1704984187112749