wellcomecollection / catalogue-api

:crystal_ball: The API for searching the Wellcome Collection catalogue.
https://developers.wellcomecollection.org
MIT License
4 stars 0 forks source link

invert the minimum_should_match percentage #632

Closed harrisonpim closed 1 year ago

harrisonpim commented 1 year ago

The recent changes to the query have made things our matching logic a bit too permissive.

From a feedback email:

We looked at searches for:

  • Mary Barnes (sorted for Newest first), that retrieved over 16,000 records, including everybody called Mary or Marie or Marya;
  • stereoscopic photographs (sorted for oldest first), which retrieves over 26,000 results including three 15th century manuscripts, none of which appear to have any relevance to the search term;

As a quick fix, we can invert the minimum_should_match percentage from 80% to -20%.

This should mean that instead of requiring 1 token to be present in a 2-token query, or 2 tokens to be present in a 3-token query, etc, we will require all terms to be present. Set at -20%, The minimum_should_match condition will only be applied to queries which are longer than 4 tokens., above which the percentage will be rounded down.

See the the docs for the minimum_should_match parameter for a more detailed explanation

harrisonpim commented 1 year ago

why is our indian journal of medical research 1930-1931 test failing again 🤔