lukasschwab / arxiv.py

Python wrapper for the arXiv API
MIT License
1.11k stars 123 forks source link

Support date queries #23

Closed jasonmcewen closed 4 years ago

jasonmcewen commented 5 years ago

Thanks for a great code!

Would it be possible to specify a to/from date range for queries?

In the arxiv API documentation I surprising don't see support for this.

My use case is that I'd like to use arxiv.py for regularly checking new arxiv articles given search criteria. My plan is to run this periodically so I'd just like to run my search queries for articles since the last run.

Thanks for any help!

lukasschwab commented 5 years ago

Hi @jasonmcewen! Thanks for the request––as date range queries aren't supported by the arXiv API, I do not think I will implement them in this package in the short term. I think preserving the requested sortBy while applying this filter would mean O(n) iteration over all the query results.

That iteration also requires iteratively requesting several pages of results; this has subrequirements:

I think this kind of auto-pagination would be slick (especially because it enables enhancements like date ranges!), so I'll think about adding it in the future. For the time being, I hesitate to overhaul this package so dramatically.

lukasschwab commented 4 years ago

That iteration also requires iteratively requesting several pages of results

This feature was added in https://github.com/lukasschwab/arxiv.py/pull/14!

Rather than obscuring the O(n) operation here, I recommend applying the filter to the query results iterator:

# Some condition on the string date
def in_range(date):
  pass;

result = arxiv.query(query="quantum", max_results=100, iterative=True)
for paper in result():
  if (in_range(paper.published)): 
    print(paper)

# Alternatively, if you want to have the full filtered list at once:
result = arxiv.query(query="quantum", max_results=100, iterative=True)
all_filtered_papers = [p for p in result() if in_range(p.published)]