inspirehep / rest-api-doc

Documentation of the INSPIRE REST API
https://inspirehep.net
Creative Commons Attribution Share Alike 4.0 International
40 stars 10 forks source link

Plans for the OAI interface and new API output #10

Closed steog88 closed 3 years ago

steog88 commented 4 years ago

I have few applications that use the OAI API (http://old.inspirehep.net/oai2d?verb=Identify) in order to get bulk downloads of recently modified records (I am mostly interested in articles, though). I was now wondering whether the OAI interface will be disabled once the new API will be fully operative, and how to use the new one to obtain only the records that were added or modified in some given time interval (two specific dates, let's say). Is that done through the "date" search keyword?

I also look forward to seeing a way to reduuce the metadata content in the new API response. The current one contains much more than what I need, in most of the cases. Actually, I think a first approach would be to allow users to simply exclude metadata from the output (id, creation date -- and possibly citation counts -- can be enough!), while a more powerful filtering of the schema may be implemented later.

michamos commented 4 years ago

We're currently investigating whether adding an OAI-PMH API to the new platform. As we only had a handful of regular users of it on the old website, it's not certain we'll add it. Could you tell us more about your use-case? there might be a better way to replicate it with the API.

The date keyword looks at all dates in the record metadata, not modification date. So changing, say, only the references wouldn't affect it as it doesn't change any of the dates in the record. In principle, the du keyword (for "date-updated") can be used to filter on most recent modification date, but it's currently unreliable due to the frequent re-syncing with the old system that causes spurious modifications of records.

OTOH, the filtering of metadata responses has been requested often and we'll probably add it in the near-future.

steog88 commented 4 years ago

OK, thanks for the information. Concerning the use-case, we have local databases of bibtex entries that we take from INSPIRE. We usually add records to the database when they appear on arxiv, and every day we check the recent modifications in order to detect which of them were published in the previous 24 hours, so that the local information is kept up to date automatically. As far as I understand, even if there are spurious modifications, collecting the cumulative results of queries with du should work. At most, I will have bigger downloads than what really needed. This is one use of the OAI API. The other one is basically trivial, I was reading some information (creation date, publication date, old keys, ADS/arxiv/doi identifiers and more) from the MARCXML output, but I can use the new JSON one instead. I will just have to spend time on changing the codes.

michamos commented 3 years ago

@steog88 as far as I understand, you have found a solution using the current API, so I'll close this issue, but feel free to comment/reopen if that's not the case. For the filtering of API results, we're currently working on it and it will be available soon.