inspirehep / rest-api-doc

Documentation of the INSPIRE REST API
https://inspirehep.net
Creative Commons Attribution Share Alike 4.0 International
40 stars 10 forks source link

Search by author-count and reduced-size JSON output #8

Closed agbuckley closed 3 years ago

agbuckley commented 4 years ago

I'm trying to port the scripts that we use to build the Rivet analysis-coverage pages (e.g. https://rivet.hepforge.org/rivet-coverage) to use the new API and have got stuck in replicating the old-style API calls. Our previous URL structure was

https://old.inspirehep.net/search?ln=en&ln=en&p=find+cn+${EXPT}+and+ac+${NAUTH}%2B+and+de+${YEAR}&of=xm&action_search=Search&sf=earliestdate&so=d&rm=&rg=250&sc=0&ot=001,024,035,037,710,245

in which we ran over a list of collaborations to get the $EXPT values, downloaded each $YEAR separately, set the $NAUTH to 15 or 100 depending on experiment size, and use the ot restriction to reduce the size of the resulting XML to only the fields we need (with full author and citation information was far too big).

I think I've replicated the collaboration and year searches with this URL: https://inspirehep.net/api/literature?size=200&q=collaborations.value:ATLAS%20and%20imprints.date:2019, but I'm not sure about the rest. Here are my questions:

Thanks!

michamos commented 4 years ago

We try to maintain compatibility with the old searches as much as possible, so the search query find+cn+${EXPT}+and+ac+${NAUTH}%2B+and+de+${YEAR} you were doing should still work (with q= instead of p=).

  • I'm not certain that this date restriction excludes erratum dates, as we would want: does it, or do we need something else?

de/earliest_date is still supported, but it takes into account the earliest date in the record. So that might not be exactly what you you're asking for (publication date), as often the earliest date will be the arXiv date. If you want the publication date, you can use jy/journal-year, but it will also include errata, addenda, etc., which you'll have to filter out at a later stage.

  • I can't see how to restrict the author count with this query syntax: there is a author_count field in the JSON (but not the scheme) but it's set to 0 in all cases as far as I can see. Can you tell me how to do this: maybe some syntax for the size of a collection element? Or another way of excluding single/few person proceedings etc. on behalf of collaborations: we only want the "official", full-collaboration papers.

You can still use ac/author_count to filter on the number of authors as on the previous website and author_count inside metadata should be set to the right value in the response.

  • I also don't see anything about an equivalent to the old ot field that allowed us to reduce the output size... in particular we don't need or want the huge author list or citation blocks: can we exclude them somehow?

We know that this is annoying. It's not supported yet but will be added soon.