ucldc / public_interface

Calisphere public interface source code (UCLDC Project) master branch should match live site
https://calisphere.org/
2 stars 5 forks source link

Convert multiple queries for facets with filter applied to one query (open search supports this via post_filter + a filter within aggs) #375

Open amywieliczka opened 4 months ago

amywieliczka commented 4 months ago

This query will return all documents with "solr" in the title as a query response.

The aggregations will aggregate on that entire query response to get all tags for all documents with "solr" in the title.

The second aggregation will filter that entire query response for documents with "solr" in the title and "elasticsearch" in the tags, and then aggregate to get all uploaders for the filtered subset of the query response.

Finally, post filter will filter all the documents with "solr" in the title to return only those documents with "solr" in the title and "elasticsearch" in tags.

This enables precisely the kind of behavior we see in our faceting sidebar, but without having to run multiple queries in order to exclude the "elasticsearch" filter from our facet request.

GET videosearch/_search:

{
  "query": {
    "bool": {
      "must": {
        "match": {
          "title": "solr"
        }
      }
    }
  },
  "size": 0,
  "aggs": {
    "popular_tags": {
      "terms": {
        "field": "tags.keyword",
        "size": 5
      }
    },
    "popular_uploaders": {
      "filter": {
        "term": {
          "tags.keyword": "elasticsearch"
        }
      },
      "aggs": {
        "uploader_tags": {
          "terms": {
            "field": "uploaded_by.keyword"
          }
        }
      }
    }
  },
  "post_filter": {
    "term": {
      "tags.keyword": "elasticsearch"
    }
  }
}
amywieliczka commented 4 months ago

This could be considered a post-MVP enhancement, but is probably one of the first things we should try (even before analyzing Django caching), to boost query performance on Calisphere.