Closed agnesgaroux closed 1 month ago
Christy's original email
I’m trying to compare how many books we’ve digitised against the entirety of the catalogued collection. This is proving quite tricky to do through the user interfaces.
Is it possible to use the API to find out how many books we have catalogued in Sierra that were published before 1900?
And following on from that, is it possible to tell which ones do not have a version that’s digitised/hosted by Wellcome (as opposed to third party subscription sites)? And then to export a list of those?
Happy to chat further if anyone has time to look into this – it’s not super urgent.
Query
// works that are books
// that were produced before 1900
// that are not available online
get works-indexed-2024-08-15/_count
{
"query": {
"bool": {
"must": [
{
"match": {
"filterableValues.format.id": "a"
}
},
{
"range": {
"query.production.label": {
"lt": 1900
}
}
}
],
"must_not": [
{
"match": {
"filterableValues.availabilities.id": "online"
}
}
]
}
}
}
Aggs to add up the items for the found works
"aggs": {
"total_items_id": {
"sum": {
"script": {
"source": "doc['query.items.id'].length"
}
}
}
}
I have a local script to query the index and parse the hits into a csv, with identifier (bNumber), title and workId (potentially looking to format these into works url)
Blocked: Christy is on leave right now and we need some clarifications on a few points
you can pause on this if you like, I'm checking whether locations would helpful or not to add in determining sets that are suitable for digitisation (there's a lot of stuff on the list of 10k that you sent that we have already looked through, journals catalogued as books, etc. but location might help to narrow down the collections we're interested in.
Closing this. Will open new issue if she wants the location added
Email from Natalie after she talked to Christy