[2] Solve duplicate aggregation buckets/filter options

agnesgaroux commented 6 months ago

Some of our aggregatableValues for interpretations have the same label but different id Case in point: speech-to-text. There are different types of "speech-to-text", eg. translation, screen, tablets with all the same label, and different ids Aggregations are computed by unique label + id. Result is, we have multiple "speech-to-text" filters

What we want One filter in the dropdown for all types of speech-to-text

We can get there in different ways

For the dropdown, merge together the options/filters that have the same label, add up their bucket sizes. If the filter is selected we send over all the ids to the API to find and return all types of speech-to-text matching documents
Could do it at index time: if the interpretation label is "speech-to-text", don't use the original id but a hardcoded "speechToTextInterpretation" uuid so that they're all aggregated in the same bucket later on. Downside is, we lose granularity: there will be not material distinction between different types of speech-to-text access needs

agnesgaroux commented 6 months ago

This happened for catalogue search https://github.com/wellcomecollection/wellcomecollection.org/blob/9bc05c8b58b215711151cb3b56835e1ee9afb4d7/content/webapp/services/wellcome/catalogue/filters.ts#L151 "homonymous options are (...) merged."

rcantin-w commented 6 months ago

The complexity of it stems from Content API mostly using IDs in queries instead of labels (as does the Catalogue API), if you ignore the "hacky" ones, like the future location=online.

It might need to get done on the BE using a UID created out of thin air; we agree to discuss with the brain trust and make a decision based on that. Ticket will not be ready to be worked on until we figure that one out.

agnesgaroux commented 6 months ago

Current eventDocument filter and aggregatableValues for interpretations

"filter": {
  "interpretationIds": [
    "ZW751RIAACUAvsjx",
    "YqiCnxEAACMA8VLW",
    "Wn3STCoAACgAIedR"
  ]
},
"aggregatableValues": {
  "interpretations": [
    """{"type":"EventInterpretation","id":"ZW751RIAACUAvsjx","label":"British Sign Language"}""",
    """{"type":"EventInterpretation","id":"YqiCnxEAACMA8VLW","label":"Speech-to-text"}""",
    """{"type":"EventInterpretation","id":"Wn3STCoAACgAIedR","label":"Hearing loop"}"""
  ]
}

agnesgaroux commented 6 months ago

What we want

"filter": {
  "interpretationIds": [
    "ZW751RIAACUAvsjx",
    "YqiCnxEAACMA8VLW",
    "Wn3STCoAACgAIedR"
  ],
  "interpretationLabels": [
    "Speech-to-text",
    "Hearing loop",
    "British Sign Language"
  ],
},
"aggregatableValues": {
  "interpretations": [
    """{"type":"EventInterpretation","label":"British Sign Language"}""",
    """{"type":"EventInterpretation","label":"Speech-to-text"}""",
    """{"type":"EventInterpretation","label":"Hearing loop"}"""
  ]
}

agnesgaroux commented 6 months ago

With the above, we will be able to:

aggregate EventInterpretation based on label only (so all the Speech-to-text will be bucketed together, no more duplicate 👍)
filter by label interpretationLabels

Am I correct that interpretations labels will be sent as query params, while format and audience will be sent as prismic ids? Do we also want to search/filter formats and audiences by label instead of id, for consistency?

jamieparkinson commented 6 months ago

Slight side issue: this has thrown up that we need to make sure the filter query parameter names match the display model as per https://github.com/wellcomecollection/docs/tree/main/rfcs/037-api-faceting-principles, eg this new proposal would use interpretations.label=blah

There is still a bit of a remaining issue that the aggregations which would be returned by the above (ie the id-less EventInterpretations) wouldn't actually exist anywhere else... I wonder if the pragmatic solution to this is just to change the type in these values to EventInterpretationLabel?

wellcomecollection / content-api

[2] Solve duplicate aggregation buckets/filter options #106