Closed acka47 closed 4 years ago
$ curl --header "Accept: application/x-jsonlines" "http://lobid.org/gnd/search?q=type%3AUndifferentiatedPerson+AND+_exists_%3AvariantName&size=10" | jq -r .id > undifferentiated-with-variantName.txt
import requests
import json
filepath = 'undifferentiated-with-variantName.txt'
fp = open(filepath) count = 0
def build_url(id): return 'https://lobid.org/resources/search?q=contribution.agent.id%3A%22' + id.rstrip() + '%22'
for id in fp.readlines(): if requests.get(build_url(id)).json()['totalItems'] > 0: print id count += 1 print count
The script is currently running. Will provide the number of IDs when finished.
After ~18 hours the script stopped running and threw an error:
Traceback (most recent call last):
File "gnd-ids-in-hbz01.py", line 13, in <module>
if requests.get(build_url(id)).json()['totalItems'] > 0:
File "/home/acka47/.local/lib/python2.7/site-packages/requests/models.py", line 897, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
I will modify it a bit and try again.
I tweaked is a bit and am now running it directly against the ES index. It is much faster (~100 requests per second).
Here is the updated script for step 2.) (with obfuscated index url):
import requests
import json
filepath = 'undifferentiated-with-variantName.txt'
fp = open(filepath)#
total = 0
count = 0
def build_url(id):
return 'index/_search?q=contribution.agent.id%3A%22' + id.rstrip() + '%22'
for id in fp.readlines():
total += 1
if requests.get(build_url(id)).json()['hits']['total'] > 0:
count += 1
print("%s IDs of %s are in hbz01." % (count, total))
Result: 502466 IDs of 975198 are in hbz01.
Closing.
Requested by colleague I.G. in the context of the effort to remove undifferentiated persons from GND (see https://wiki.dnb.de/x/aJDOC for background).
contribution.agent.id:"http://d-nb.info/gnd/103804528"