wellcomecollection / content-api

šŸ“– The API + ETL pipeline for searching the Wellcome Collection Prismic Repository.
MIT License
0 stars 0 forks source link

[2] Solve duplicate aggregation buckets/filter options #106

Open agnesgaroux opened 6 months ago

agnesgaroux commented 6 months ago

Some of our aggregatableValues for interpretations have the same label but different id Case in point: speech-to-text. There are different types of "speech-to-text", eg. translation, screen, tablets with all the same label, and different ids Aggregations are computed by unique label + id. Result is, we have multiple "speech-to-text" filters

What we want One filter in the dropdown for all types of speech-to-text

We can get there in different ways

agnesgaroux commented 6 months ago

This happened for catalogue search https://github.com/wellcomecollection/wellcomecollection.org/blob/9bc05c8b58b215711151cb3b56835e1ee9afb4d7/content/webapp/services/wellcome/catalogue/filters.ts#L151 "homonymous options are (...) merged."

rcantin-w commented 6 months ago

The complexity of it stems from Content API mostly using IDs in queries instead of labels (as does the Catalogue API), if you ignore the "hacky" ones, like the future location=online.

It might need to get done on the BE using a UID created out of thin air; we agree to discuss with the brain trust and make a decision based on that. Ticket will not be ready to be worked on until we figure that one out.

agnesgaroux commented 6 months ago

Current eventDocument filter and aggregatableValues for interpretations

"filter": {
  "interpretationIds": [
    "ZW751RIAACUAvsjx",
    "YqiCnxEAACMA8VLW",
    "Wn3STCoAACgAIedR"
  ]
},
"aggregatableValues": {
  "interpretations": [
    """{"type":"EventInterpretation","id":"ZW751RIAACUAvsjx","label":"British Sign Language"}""",
    """{"type":"EventInterpretation","id":"YqiCnxEAACMA8VLW","label":"Speech-to-text"}""",
    """{"type":"EventInterpretation","id":"Wn3STCoAACgAIedR","label":"Hearing loop"}"""
  ]
}
agnesgaroux commented 6 months ago

What we want

"filter": {
  "interpretationIds": [
    "ZW751RIAACUAvsjx",
    "YqiCnxEAACMA8VLW",
    "Wn3STCoAACgAIedR"
  ],
  "interpretationLabels": [
    "Speech-to-text",
    "Hearing loop",
    "British Sign Language"
  ],
},
"aggregatableValues": {
  "interpretations": [
    """{"type":"EventInterpretation","label":"British Sign Language"}""",
    """{"type":"EventInterpretation","label":"Speech-to-text"}""",
    """{"type":"EventInterpretation","label":"Hearing loop"}"""
  ]
}
agnesgaroux commented 6 months ago

With the above, we will be able to:

Am I correct that interpretations labels will be sent as query params, while format and audience will be sent as prismic ids? Do we also want to search/filter formats and audiences by label instead of id, for consistency?

jamieparkinson commented 6 months ago

Slight side issue: this has thrown up that we need to make sure the filter query parameter names match the display model as per https://github.com/wellcomecollection/docs/tree/main/rfcs/037-api-faceting-principles, eg this new proposal would use interpretations.label=blah

There is still a bit of a remaining issue that the aggregations which would be returned by the above (ie the id-less EventInterpretations) wouldn't actually exist anywhere else... I wonder if the pragmatic solution to this is just to change the type in these values to EventInterpretationLabel?