Closed paul-butcher closed 5 months ago
Perhaps aids diagnosis vs. AIDS diagnosis
I think this collocation would allow records for the two meanings (aids for diagnosing any ailment, vs how to diagnose AIDS) without significantly favouring one or the other (e.g. in a contest between a judicial hearing about AIDS, vs hearing aids, it's likely that the hearing aids will win in a search for "aids hearing" regardless of capitalisation)
The Order test:
Capitalised match appears before lower case match
is unreliable across indices.I believe that this is because there is so much content about AIDS whose results are more-or-less equally ranked for a query on that term, that there is no guarantee that any given document containing that term will be ranked within the top 100.
Any changes to mappings or analysis, or even just a different version of Elasticsearch could easily shuffle the order of results such that one or both of the expected records drops out of the top 100.
Furthermore, because of the surfeit of AIDS content, I believe that this test is not really examining what it purports to. A search for
aids
rather thanAIDS
still seems to preferentially returnAIDS
records, probably because the case-insensitive match is finding it in more fields, outweighing the boost applied to the case-sensitive match.