hbz / lobid

Linking Open Bibliographic Data
https://lobid.org/
Eclipse Public License 2.0
15 stars 4 forks source link

Include labels with aggregation (facet) data in API search results #414

Closed kshepherd closed 4 years ago

kshepherd commented 4 years ago

An API search for a subject heading with keyword "berlin" returns some aggregation data: geographic ID and subject category. These would be very useful to use as quick and easy facets / filters for further search refinement (assuming these fields can be used as filters?), but they would need to also have human-readable labels in the JSON. Example search: https://lobid.org/gnd/search?q=berlin&format=json&filter=type%3ASubjectHeading

So, I have a few related questions - and if the answer is "yes", a feature request

  1. Can subject category and/or geographic ID be used as a search filter like 'type" can? (or, if not, could we construct a query such that it acts as a filter?)
  2. Related, can we include multiple filters / by-field queries in a single search?

If this kind of searching is possible, it would be great to use aggregations to help users refine searches for subject headings or places. In which case, could labels be included as well as IDs in the aggregation JSON?

"aggregation" : { "gndSubjectCategory.id" : [ { "key" : "https://d-nb.info/standards/vocab/gnd/gnd-sc#2.1", "doc_count" : 107 }, { "key" : "https://d-nb.info/standards/vocab/gnd/gnd-sc#12.1b", "doc_count" : 17 }, { "key" : "https://d-nb.info/standards/vocab/gnd/gnd-sc#12.4", "doc_count" : 15 }, { "key" : "https://d-nb.info/standards/vocab/gnd/gnd-sc#13.1a", "doc_count" : 15 }, {......................

fsteeg commented 4 years ago

Regarding your two questions: yes, that's possible. You can construct these queries using the UI:

You can add &format=json to these queries constructed using the UI to get the API call: https://lobid.org/gnd/search?q=berlin&filter=+(geographicAreaCode.id:"https://d-nb.info/standards/vocab/gnd/geographic-area-code%23XE-NZ")+(gndSubjectCategory.id:"https://d-nb.info/standards/vocab/gnd/gnd-sc%2313.4p")&format=json

Regarding your feature request: it would be non-trivial to add the values to the aggregation API. The aggregations are a feature provided by the Elasticsearch backend, which provides the values and counts for a specific field.

We could allow custom fields here, like geographicAreaCode.label, which would provide you with the labels instead of the IDs. However, this results in less precise searches (based on labels, not IDs) and less stable URLs (labels can change). You also don't have the actual URIs/IDs at that point, in case you need them.

What we do in our UI instead is to use URIs in the aggregations and the search links that we generate, and use the labels for display only. For the fields you mention (where values don't start with https://d-nb.info/gnd/), the labels are defined in a few ontology files, which we read and provide to the UI via a static GndOntology.label(String id) method in https://github.com/hbz/lobid-gnd/blob/master/app/models/GndOntology.java. Maybe you can reuse that approach in your context?

(The part that uses the internal index is for the URIs that start with https://d-nb.info/gnd/, i.e. GND entities themselves. If you also need labels for these, you could get the labels, i.e. the preferredName of that entity, from the API directly, e.g. curl -s https://lobid.org/gnd/133992535.json | jq '.preferredName'. Otherwise, you can ignore/remove that part.)

kshepherd commented 4 years ago

Many thanks for this information, @fsteeg I have a nice little "subject heading search UI" working with category filters, now. I will close this comment as there is no feature request here -- as you note, it's non-trivial to include labels in the ES results and it is fairly easy to resolve labels offline with ontology files.

acka47 commented 4 years ago

@ksheperd, nice to hear that you got it working. As we are curious, please let us know if any features are implemented in DSpace that request the lobid-gnd API.