lukasschwab / arxiv.py

Python wrapper for the arXiv API
MIT License
1.07k stars 120 forks source link

Retrieve publications between two dates #104

Closed kennethleungty closed 1 year ago

kennethleungty commented 1 year ago

It would be useful if we can retrieve publications between two dates, or any other form of date selection feature e.g., month-year. Is that currently doable?

lukasschwab commented 1 year ago

Hi @kennethleungty –– unfortunately, the arXiv API doesn't have an affordance for searching by date fields. Here's the complete list of individually-searchable fields.

To pull all the results in a certain date range, I'd recommend using SortCriterion.SubmittedDate as the sort_by parameter in your search, then processing results until you've covered your search range or you've exhausted the result set.

If you're dealing with really big searches and you don't want to paginate to your date range, there may be a way to use result counts and offsets to binary-search for your range in the result stream. It still isn't as good as having searchable date fields, but it would cut your worst-case request count from $n$ to $\log_2(n)$. I'm not sure this package helpfully exposes the result count, though.