lukasschwab / arxiv.py

Python wrapper for the arXiv API
MIT License
1.11k stars 123 forks source link

Update arxiv.py #33

Closed Santosh-Gupta closed 5 years ago

Santosh-Gupta commented 5 years ago

added variable start

Description

Breaking changes

List any changes that break the API usage supported on master.

Relevant issues

List GitHub issues relevant to this change.

Checklist

Santosh-Gupta commented 5 years ago

I made all the changes except writing tests for the start parameter. I took a look at this

https://github.com/Santosh-Gupta/arxiv.py/blob/master/tests/test_search.py

and I am guessing I just add the start parameter where ever there is a max_result parameter?

But I am not sure how to handle lines 71-73

            for k, v in parse_qsl(url.split("?")[1]):
            if k == "max_results":
                max_result = int(v)
Santosh-Gupta commented 5 years ago

I added added time_sleep as a parameter for query, because something it skips results, so I am guessing time_sleep = 3 may be too soon. I am experimenting with time_sleep = 5.

Edit:

Even raising sleep time to 10 brings in more results. I am experimenting with a high volume of results though, 80,000 ish

Edit:

It looks like the api results are just inconsistent. I'm not sure if time sleep has an effect. In think the only secure way is to run the query a few times, switching between descending and ascending, appending values if they do not already exist.

lukasschwab commented 5 years ago

Thanks for the extra work here! I'm going to do some work here––incl. reverting the changes to time_sleep logic, seeing as you concluded it doesn't make a consistent difference––then merge and roll a new release.

Cheers!