lukasschwab / arxiv.py

Python wrapper for the arXiv API
MIT License
1.07k stars 120 forks source link

Missing "start" param #121

Closed tendau closed 1 year ago

tendau commented 1 year ago

Motivation

I may be missing something but I do not see any way to add a start or page to the search query. This would be helpful for paginating result.

Solution

The ability to add the search parameter to a search query

lukasschwab commented 1 year ago

You should be able to specify it through the named parameter offset of the results function.

lukasschwab commented 1 year ago

Now that I'm at a keyboard, some more color!

Docs for the results functions:

offset is specified outside of the query because it affects the subset of results returned. Contrast that against max_results, which specifies the total number of records that'll match the query, starting from the first record that matches.

For example, setting offset here reduces the number of returned items:

>>> import arxiv
>>>
>>> # NOTE: offset=0 is the default behavior.
>>> [item.get_short_id() for item in arxiv.Search(query="test", max_results=10).results(offset=0)]
['1802.07361v1', '1912.09881v1', '1606.00288v1', '0710.4669v1', '2010.13410v1', 'math/0207300v1', '1506.01646v1', '1612.04351v1', '1908.07145v1', '2211.13622v1']
>>>
>>> [item.get_short_id() for item in arxiv.Search(query="test", max_results=10).results(offset=5)]
['math/0207300v1', '1506.01646v1', '1612.04351v1', '1908.07145v1', '2211.13622v1']

.results(...) yields results until the end of the result set (nothing else matches the query), so in general the caller shouldn't have to set an initial offset. I can see it being useful for resuming an interrupted long-running query, though!

Closing now, but feel free to tell me a little more about your use case!