inspirehep / rest-api-doc

Documentation of the INSPIRE REST API
https://inspirehep.net
Creative Commons Attribution Share Alike 4.0 International
40 stars 10 forks source link

Citesummary #3

Open paurkedal opened 4 years ago

paurkedal commented 4 years ago

Is there an efficient way of extracting what corresponds to the Citesummary of the old site? In particular, we have been using queries like [http://old.inspirehep.net/search?ln=en&ln=en&p=find+cn+atlas+and+d+2019&of=hcs&action_search=Search&sf=&so=d&rm=&rg=25&sc=0]() to extract annual metrics for the ATLAS and ALICE collaborations,

The solution I can see with the documented API is to request the full set of entries and fetch the citations entry of each paper. That may be feasible if we cache time-sliced partial results as we update, though I'm hoping there is a better way.

michamos commented 4 years ago

Currently we don't have a better way, and it will require thousands of request to compute those stats for those large experiments. We will probably expose the citation summary we're using on the website (as appears here) through the API at some point, but I can't tell you when that will happen as it's a bit more tricky than anticipated.

paurkedal commented 4 years ago

Thanks for the info. The website renders the numbers with JavaScript, so it does not look like we can resurrect our solution of parsing HTML. I might still look into computing it, since if we store intermediate result per day, the rate of requests should be limited, but it's not so urgent that it can't wait a few months.

paurkedal commented 4 years ago

As long as the old site is operational, we can still use our current solution though.