te-papa / collections-api

Museum of New Zealand Te Papa Tongarewa - Collections API
10 stars 2 forks source link

Getting started guide: Faceting and filtering #6

Closed fkleon closed 5 years ago

fkleon commented 5 years ago

The current guide mentions that field search on nested fields is not possible. It is however possible to filter/facet on them, which often is the use case requiring filters on nested fields. I suggest adding a Filtering section that explains filters and facets.

Here is a draft:

Faceting and filtering

Faceting generally describes the dynamic grouping of data into categories (or "terms"), which can then be used to filter the results and drill down into the data. It is a tool that allows to produce a high level overview of the data, and is often used to discover entry points for further queries.

An example of faceting in action can be found on the Collections Online website. After performing a search, the search interface displays categories that the data falls in, for example the Collection associated with objects:

collection-facet

In this example, the data is faceted on the Collection field, offering additional context to the list of search results.

Faceting

In addition to performing searches, the advanced search interface allows you to perform a faceted search. The faceting implementation utilises Elasticsearch Term aggregations under the hood.

Specify the faceted fields along with the number of results you want to receive for each facet, in the facets parameter of the search request:

POST https://data.tepapa.govt.nz/collection/search

{
  "query" : "James Cook",
  "facets": [ {
    "field": "production.facetCreatedDate.decadeOfCentury",
    "size": 3
  }, {
    "field": "production.spatial.title",
    "size": 3
  } ]
}

The response will contain the top-N requested facets along with the number of matching documents in the facets field. In our example these are the most common production decades, and the most common production locations for objects matching the search query James Cook:

{
  "results": [ ... ],
  "facets": {
    "production.facetCreatedDate.decadeOfCentury": {
      "1940s": 11751,
      "1960s": 12736,
      "1970s": 11751
    },
    "production.spatial.title.verbatim": {
      "New Zealand": 71945,
      "North Island (New Zealand)": 5171,
      "United Kingdom": 6031
    }
  },
  "_metadata": { ... }
}

Facet labels are returned in alphabetical order.

Notes:

Filtering

Simple filtering is supported within the search query, for example by appending AND collection:Art to the query. This simple syntax is however mostly restricted to root-level fields and cannot be used on all nested fields. The advanced search interface offers richer filtering. This feature is mainly designed as a counterpart to the Faceting implementation, and allows you to filter on all facetable fields, including nested fields.

This example lets you filter the results to only include objects that have been produces in the 1970s. Note that the filter keywords generally match the labels coming back from a facet request:

POST https://data.tepapa.govt.nz/collection/search

{
  "query" : "*",
  "filters": [ {
    "field": "production.facetCreatedDate.decadeOfCentury",
    "keyword": "1970s"
  } ]
}

The full reference documentation for advanced search requests can be found here.

staplegun commented 5 years ago

Added in new wiki page - https://github.com/te-papa/collections-api/wiki/Search-strategies