The use of a word delimiter token filter in this text analysis means that the Miro IDs of the Royal Veterinary College collection get split into 2 tokens: A0000856 becomes [A, 0000856]. Lots of queries for string literals from the catalogue (as opposed to keyword searches) happen to contain A (as in the single-letter English-language article), and as ID searches are strongly boosted these searches were returning the RVC images.
This seems to have been caused by accidentally analysing
images.identifiers
with a text analyzer rather than a keyword analyzer: https://github.com/wellcomecollection/catalogue-pipeline/blob/main/index_config/mappings.works_indexed.2024-01-09.json#L298-L301The use of a word delimiter token filter in this text analysis means that the Miro IDs of the Royal Veterinary College collection get split into 2 tokens:
A0000856
becomes[A, 0000856]
. Lots of queries for string literals from the catalogue (as opposed to keyword searches) happen to containA
(as in the single-letter English-language article), and as ID searches are strongly boosted these searches were returning the RVC images.