Open fsteeg opened 7 years ago
This can be reproduced with any queries returning high result counts, e.g. owner facet for: http://lobid.org/resources/search?q=k%C3%B6ln
The basic problem here is that we are faceting over a field (the item owner) that's not in our data. This approach won't work for the entire catalog: if we query everything, we'd have to get all items, and create the owner facet from that.
Instead, I suggest we add an exemplar.owner
field, so for example in http://lobid.org/resources/HT012213725?format=json we'd have:
"exemplar": [{
"id": "http://lobid.org/items/HT012213725:DE-6:ZD%207381#!",
"owner": "http://lobid.org/organisations/DE-6",
"label": "lobid Bestandsressource"
}],
That way, we could simply facet over exemplar.owner
directly, which would give us all owners (not all items, as with the current facet, which is based on exemplar.id
).
What do you think @dr0i @acka47? If it makes no sense to expose the owner in the data (but I do think it's useful for API usage), we could also create an internal Elasticsearch field or a custom aggregation. If we do want to expose it, we should add it on the Metafacture level.
+1 from me. I already proposed embedding item information in the instance data, see #140. We might just reopen that issue.
Using a child aggregation on our data querying "köln" seems to come with a plausible result:
"hits" : {
"total" : 569.808,
...
"aggregations" : {
"items" : {
"doc_count" : 1.686.515,
"top-isil" : {
...
"buckets" : [ {
"key" : "http://lobid.org/organisations/DE-38",
"doc_count" : 172.288
} ...
I can imagine that the factor 3 in ration resources/items is a result of libraries holding more than one item. Is this acceptable or do you really want to have a ration of 1? Though I doubt that if we take the data from the child into the parent and subsequently have e.g. 3 same exemplar.owner.id
(reflecting the fact of multiple holdings of a manifestation (aka "resource")) an aggreagation about this would would result in that 1/1 ration (without tinkering with filter
or something).
Oh nice, a child aggregation, I didn't consider that. That should work, I will try it.
Reopening, see discussion starting in https://github.com/hbz/lobid-resources/issues/278#issuecomment-283329330.
This came up again, see #1169, where @hagbeck wrote:
From the Aleph based index we're getting 1.334.514 records [1] The facet "Bestand in Bibliotheken" in the Aleph based index shows 1.471.170 records.
I pointed out this problem in https://github.com/hbz/lobid-resources/issues/278#issuecomment-283333385:
Isn't the underlying mechanism that the facet gives the number of items while the query result lists the FRBR manifestations (or in bibframe-speak: instances)?
This came up again in context of the comparison of ALMA and ALEPH resources of UB Münster. Idealy this should be fixed before ALMA Fix replaces ALEPH-Morph. https://github.com/hbz/lobid-resources/issues/1601
@blackwinter will take a look whether this should be added to milestone DigiBib or not.
We would not be affected by this issue.
Since owners are based on exemplar aggregations, and aggregation requests have a limited size, the owner counts are wrong (just the owners of the most frequent X exemplar, which are actually all 1). To fix this, we have to improve the efficiency of the aggregations processing to enable an aggregations request with unlimited size for exemplars.