Closed acka47 closed 4 years ago
Harvested and indexed, see list above.
Ok, I can now check some basic infos, e.g.:
However, it would be great to have aggregations.. Do we have to configure something for this or can I already view aggregations?
Fixed kibana by restarting it.
Re aggs: possible by defining fields as keywords
. I did this for affiliation
- tell me which fields you want to have aggs
, I will configure them. But using (huge) literals as keys is not a good idea as shown by this:
curl -XGET 'https://lobid.org/eslabs/deepgreen/_search?q=metadata.author.affiliation:*germany*&pretty=true' -d '
{
"aggs": {
"aggs1": {
"terms": {
"field": "metadata.author.affiliation",
"size": 50
}
}
}
}
'
because, as clearly can be see by the result of above's query, literals are seldom unique.
I am ok with Kibana for now so don't need any aggregations configured.
as clearly can be see by the result of above's query, literals are seldom unique.
Do you mean they are all unqiue? However, I agree that aggregations don't make sense for metadata.author.affiliation
.
Closing.
What is strange that I can not limit my search to the field metadata.author.affiliation
, e.g. https://lobid.org/eslabs/deepgreen/_search?q=metadata.author.affiliation:k%C3%B6ln won't give any hits although there are lots of cases that should show. As it works with https://lobid.org/eslabs/deepgreen/_search?q=k%C3%B6ln, the field must be indexed, though.
To enable aggregations
this field must be of a type that is not analyzed, i.e. you only can lookup with the complete value (i.e. the huge blob). Maybe we should dump the idea of having aggs
of this field?
Maybe we should dump the idea of having aggs of this field?
+1
Changed the mappings and reindexed again. (note to self: scipts reside at @aither:~/oa-deepgreen).
Closing this, as we provided some analytics for management and there are no requests pending.
We were asked to check out the DeepGreen data. As we don't have a repo for that (yet) I am posting the issue here. To get a good impression about the data, I suggest to index it in elasticsearch.
Here are the API basics:
Get resource by id:
https://www.oa-deepgreen.de/api/v1/notification/<id>
Get resource by indexing data
"https://www.oa-deepgreen.de/api/v1/routed?since=<date>
Paging & page size
pageSize=<size>
with100
being the maximum, e.g.pageSize=100
page=<page>
, from1
ton
, examplepage=2
In the response, the resource descriptions are found in the
notifications
array.Accordingly, we have to do the following:
notifications
array (e.g. with$ curl https://www.oa-deepgreen.de/api/v1/routed?since=2019-07-18&pageSize=100&page=1 | jq -r .notifications[]
) and pipe it into a JSON file:note: ES index names must be lowercased!