inspirehep / rest-api-doc

Documentation of the INSPIRE REST API
https://inspirehep.net
Creative Commons Attribution Share Alike 4.0 International
40 stars 10 forks source link

max number of documents #15

Closed sduquemesa closed 3 years ago

sduquemesa commented 3 years ago

I'm trying to get the info of the 11k+ citations of this article https://inspirehep.net/literature/1124337. However, when I use the API the results are limited to 10k.

For example, the GET request https://inspirehep.net/api/literature?size=100&page=100&q=refersto%3Arecid%3A1124337 returns the results 9.9k to 10k but it does not contain a "next" key.

When trying to fetch the next page page=101 with the GET request https://inspirehep.net/api/literature?size=100&page=101&q=refersto%3Arecid%3A1124337 the result is

{"status": 400, "message": "Maximum number of 10000 results have been reached."}

michamos commented 3 years ago

Yes, that's a current limitation of the API that should be better documented.

You can work around it by splitting your query into several queries based on some criteria, then combining the results at the end. For example, you could use earliest_date to make two queries with <10k results each: https://inspirehep.net/api/literature?size=100&page=1&q=refersto%3Arecid%3A1124337&earliest_date=2000--2015 and https://inspirehep.net/api/literature?size=100&page=1&q=refersto%3Arecid%3A1124337&earliest_date=2016--2021.

This filter is convenient because earliest_date partitions the records into non-overlapping buckets based on the year of earliest date on the record.