lukasschwab / arxiv.py

Python wrapper for the arXiv API
MIT License
1.07k stars 120 forks source link

Include `start` as a argument so that we can use paging for large number of results #32

Closed Santosh-Gupta closed 5 years ago

Santosh-Gupta commented 5 years ago

Is your feature request related to a problem? Please describe.

The arxiv api has an argument called start, which when used in conjunction with max_results, allows you to using paging to sort through large number of results, since the maximum number of results is 30,000, and they recommend 1,000

https://arxiv.org/help/api/user-manual#paging

any times there are hundreds of results for an API query. Rather than download information about all the results at once, the API offers a paging mechanism through start and max_results that allows you to download chucks of the result set at a time. Within the total results set, start defines the index of the first returned result, using 0-based indexing. max_results is the number of results returned by the query. For example, if wanted to step through the results of a search_query of all:electron, we would construct the urls:

Describe the solution you'd like

Include start to the list of arguments.

lukasschwab commented 5 years ago

Yep, looks like this got lost in the 0.4.0 rework––start is being hardcoded to 0.

Very happy to review a PR that fixes this, or to fix it myself (perhaps sometime in the next two weeks).

As a stopgap: pre-0.4.0 releases (like release 0.3.1, with documentation here) include the start parameter.

Santosh-Gupta commented 5 years ago

I would love to give it a shot!

From going through the repo, it looks like all I have do is change this file

https://github.com/lukasschwab/arxiv.py/blob/master/arxiv/arxiv.py

So in that file, whereever there's a max_results, then there should be a start as well? So basically just add start if there isn't already one whereever there's a max_results ?

Santosh-Gupta commented 5 years ago

so basically I'm adding a start where ever there's a max_result but I -think- on line 145 I need to do n_left = self.max_results - self.start

Here's what I have done

https://github.com/Santosh-Gupta/arxiv.py/blob/master/arxiv/arxiv.py