wellcomecollection / catalogue-api

:crystal_ball: The API for searching the Wellcome Collection catalogue.
https://developers.wellcomecollection.org
MIT License
4 stars 0 forks source link

Works aggregations without a query term are not filtered for visibility #711

Closed paul-butcher closed 11 months ago

paul-butcher commented 1 year ago

This is a minor buglet, as non-visible documents shouldn't have any values to aggregate, but the aggregations might be slightly more efficient if we fix it.

Currently, in a Works search, the visibility filter is treated as part of the query that looks for the query term.

If there is no query term, then that query is omitted, including the visibility filter.

This means that the aggregations are operating over a document set containing 1.7Million more records than they need to.

paul-butcher commented 12 months ago

It may be best to fix this as part of https://github.com/wellcomecollection/catalogue-api/issues/677

This requires adding a filter (which is currently in the query document), to the template/params to be inserted in the absence of a query term.

The better solution would be for the template to insert the query document (without this filter) into a must clause in a top-level bool, with the (pre) filters in a filter clause.

paul-butcher commented 12 months ago

At some other time, this may be a job for a more complex ingestor and the constant_keyword feature. (and a multi-index alias?)

Rather than keeping these 1.7M documents in the indexed index, we could only put visible ones in there, and put the non-visible ones elsewhere.

We would have to handle the case where a quondam visible document becomes non-visible (suppress/delete/merge). This would require a deletion from the visible index alongside the insertion into the non-visible index.

paul-butcher commented 11 months ago

Actually, they are filtered for visibility, but with a clause in each filter aggregation, which is inefficient. I have implemented https://github.com/wellcomecollection/catalogue-api/issues/677 in https://github.com/wellcomecollection/catalogue-api/pull/717 which fixes that.